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1 . 

The  Seventh  Army  Conference  on  Applied  Mathematics  and  Computing  was  held  at 
the  U.S.  Military  Academy,  West  Point,  New  York,  on  6-9  June  1989.']  This  is  the 
second  time  the  Military  Academy  has  served  as  the  host  for  this  series  of  Army 
conferences.  For  each  of  these  meetings  we  were  fortunate  to  have  the  heads  of 
the  Department  of  Mathematics  as  Chairpersons  on  Local  Arrangements.  This  year 
Colonel  Frank  Giordano  served  in  this  capacity.  He  was-  ass i sted  in  this  task 
by  Lieutenant  Colonel  David  Arney  and_Capiaj.n_Suzanne  Swann.  These  individuals 
are  to  be— comroemied— for— thefr~ef forts  in  coordinating  all  the  details  required 
to  conduct  this  large  successful  scientific  meeting. 

"  Thi  sH-98-9  conference  was  attended  by  more  than  80  scientists  and  engineers 

representing  academia  and  various  Army  agencies.  The  meeting  featured  seven 

invited  speakers.  These  general  talks  covered  several  topics  of  current 

interest,  including  multi-scale  methods  and  wavelet  transforms,  high 

performance  computing,  phase  transformati ons,  multivariate  splines,  and 

stochastic  control .'i  The  names  of  these  speakers,  together  with  the  titles  of 

their  addresses,  are  listed  bel ow^The  second  part  of  the  program  consisted  of 

special  sessions  on  topics  such  as  stochastic  methods  for  image  analysis, 

mathematical  issues  in  computer  science,  computational  methods  for  multibody 

dynamics,  and  mechanics  of  large  deformations.^  In  addition,  about  40 

contributed  papers  were  presented  by  both  Army  'and  academic  participants. 
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SPEAKER  AND  AFFILIATION 

Professor  Alan  S.  Will  sky 
Massachusetts  Institute 
of  Technology 
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Estimation  of  Spatially-Distributed 
Processes  ) 


Microstructure  of  Crystals  Undergoing 
Phase  Transformation 

Modelling  Mi crostructure  by  Energy 
Minimization 

High  Performance  Computing 

Theory  and  Application  of  PieceWise- 
Determi nlsti c  Processes 

Wavelet  Transforms 


What's  New  in  Multivariate  Splines? 


ill 


One  of  the  sessions  at  this  conference  was  called  "Mathematics  at  West  Point." 
In  it,  members  of  the  Department  of  Mathematics  outlined  a  new  program  for  the 
cadets  entitled  "USMA's  Mathematics  Program  in  1990  and  Beyond."  The  first 
article  in  these  proceedings  is  devoted  to  this  curriculum.  / 

This  conference  is  part  of  a  continuing  program  of  Army-wide  symposia  held 
under  the  auspices  of  the  Army  Mathematics  Steering  Committee  (AMSC)  to  promote 
better  communication  between  Army  scientists  and  the  Army  Research  Office 
investigators.  In  order  that  this  mission  be  accomplished,  a  large  number  of 
scientists  had  to  expend  a  great  deal  of  effort.  The  members  of  the  AMSC  would 
like  to  thank  all  these  individuals  for  their  excellent  presentations  and  their 
valuable  contributions  to  the  field  of  science. 
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West  Point,  New  York  10996-1786 


ABSTRACT:  The  following  paper  stemmed  from  a  special 
presentation  by  the  authors  at  the  Seventh  Annual  Army 
Conference  on  Applied  Mathematics  and  Computing.  The 
presentation  described  the  background  of,  motivation  for, 
and  broad  content  of  the  new  core  mathematics  program  for 
all  cadets  starting  in  the  Fall  of  1990  —  discrete 
dynamical  systems,  calculus,  and  probability  and  statistics. 
The  United  States  Military  Academy  (USMA)  is  the  single 
largest  source  of  officers  to  the  Army  with  mathematics, 
science,  and  engineering  backgrounds.  It  is  necessary  to 
inform  Army  mathematicians  and  scientists  of  these 
curricular  developments  and  the  department's  research 
program. 

1.  Introduction. 

Mathematics  is  the  language  of  science.  Continuous 
mathematics,  especially  calculus,  has  been  the  cornerstone 
of  undergraduate  education  in  the  sciences.  First  models  of 
a  behavior  are  often  continuous.  We  need  to  continue  to 
teach  continuous  mathematics  since  we  all,  students 
especially,  gain  great  insights  from  the  closed-form 
solutions  to  continuous  models  that  calculus  affords,  even 
when  these  models  oversimplify  reality.  These  first  models 
assume  the  world  is  linear,  continuous,  and  deterministic. 
More  often,  it  is  nonlinear,  ultimately  discrete,  and 
usually  stochastic.  Discrete  mathematics  is  not  only  the 
language  of  the  discrete  world  but  also  is  the  language  of 
the  computer.  Probabilistic  mathematics  is  the  language  of 
uncertainty.  The  study  of  all  these  fundamental  areas  of 
mathematics  would  provide  a  much  better  basis  to  view  and 
model  our  world. 

The  order  of  presentation,  discrete  dynamical  systems, 
calculus  and  probability  and  statistics,  is  important. 
Discrete  mathematics  progressing  from  algebra  to  matrix 
algebra  to  discrete  dynamical  systems  is  a  better  transition 
from  high  school  mathematics  and  can  be  used  to  preview  the 
more  difficult  concepts  (for  example,  the  limit)  that 
underly  continuous  mathematics.  Finally,  probability  is 
based  on  both  continuous  and  discrete  mathematics.  With 
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recent  advances  in  textbooks  and  software,  we  at  USMA  are  in 
an  unprecendented  position  with  each  cadet  possessing  a 
computer  (portable  in  1990)  to  present  an  integrated  four- 
course  curriculum  treating  the  fundamental  ideas  of 
discrete,  continuous  and  stochastic  mathematics. 

We  feel  that  an  integrated  curriculum  will  permit  us  to 
develop  the  following  attitudes  in  cadets  that  will  carry 
over  into  their  careers  as  officers: 

-  Mathematics  is  deductive  in  character.  A  few 
principles  must  be  internalized  but  most  notions  are 
derived . 

-  Mathematics  is  a  medium  of  communications  in  which 
ideas  are  formalized  and  through  which  theories  are 
synthesized. 

-  Curiosity  and  experimental  disposition  are  essential 
characteristics  of  mathematics  education.  Through 
observation  one  seeks  universal  truths  and  establishes 
them  by  proof. 

-  Learning  mathematics  is  an  individual  responsibility. 
Textbooks,  instructors,  and  members  of  study  groups 
only  facilitate  the  process. 

-  Mathematics  is  useful. 

In  the  next  three  sections  we  describe  in  more  detail 
the  plans  for  each  of  the  three  courses  in  the  four 
semesters  of  mathematics  -  discrete  dynamical  systems,  the 
calculus,  and  probability  and  statistics.  In  designing  our 
curriculum,  we  have  taken  into  account  several  national 
reports  on  the  current  status  of  mathematics,  calculus  in 
particular,  and  the  changes  needed  to  improve  the  status  of 
our  educational  program  [1,2], 

In  the  final  section  we  briefly  describe  the  research 
program  to  which  the  Department  of  Mathematics  ascribes  for 
tenured  and  non-tenured  faculty  as  well  as  the  cadets.  We 
see  this  program  as  the  ultimate  capstone  of  the  new 
program.  Problem  solving  is  aggressively  encouraged  by 
providing  ample  opportunities  to  solve  meaningful  practical 
problems  requiring  the  integration  of  fundamental  ideas 
encompassing  one  or  more  lesson  blocks  from  one  or  more  core 
mathematics  courses . 
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Under  USMA's  proposed  curriculum,  the  first  core 
mathematics  course  is  MA  103:  Discrete  Dynamical  Systems 
with  Matrix  Algebra.  This  is  a  3  credit  hour  course.  It 
provides  introductions  to  elementary  matrix  operations  and 
matrix  methods  to  solve  systems  of  linear  equations. 

Several  applications  of  these  subjects  are  also  studied. 

Most  of  the  course  is  devoted  to  topics  and  problems  in  the 
mathematics  of  discrete  dynamical  systems.  Introductory 
material  on  modeling  problems  using  difference  equations 
motivates  the  study  of  solution  techniques  for  these 
equations  and  the  eventual  study  of  calculus  and 
differential  equations.  Concepts  and  techniques  are 
discussed  for  first-order  linear  and  nonlinear  equations  and 
higher  order  linear  equations,  and  systems  of  equations. 
Computer  software  is  used  to  demonstrate  and  solve  problems 
in  both  the  matrix  algebra  and  the  discrete  dynamical 
systems  sections  of  the  course. 

While  the  placement  and  scope  of  our  course  in  the 
curriculum  may  be  unique,  we  feel  that  this  will  be  the 
ultimate  role  of  a  discrete  mathematics  course.  As  stated 
by  Maurer  in  [3],  "there  may  yet  be  a  move  toward  more 
discrete  math  in  the  first  year. "  We  intend  to  lead  the  way 
in  designing,  testing,  and  teaching  a  discrete  course  for 
the  first  semester  of  college  mathematics.  In  order  to 
start  this  course  in  the  1990-1991  academic  year,  we  will 
have  to  piece  together  textual  material  and  write  some 
ourselves .  [4,5] 

There  are  several  reasons  to  begin  our  curriculum  with 
such  a  course.  It  provides  a  logical  transition  from  high 
school  to  college  mathematics  and  provides  an  intuitive 
motivation  for  the  limiting  concepts  of  the  calculus. 
Discrete  mathematics  also  is  the  language  of  the  computer, 
and  difference  equations  provide  an  intuitive  introduction 
to  recursion.  Discrete  models,  many  in  the  form  of 
difference  equations,  are  popular  models  of  dynamic  behavior 
and  are  worthy  of  increased  study. 

Some  of  the  goals  for  the  students  in  the  course  are: 
ability  to  formulate  discrete  mathematical  models;  ability 
to  solve  algebraic  and  discrete  models;  motivation  for  the 
calculus;  and  internalization  of  a  few  principles  of 
mathematics .  We  hope  to  use  this  course  to  develop  the 
following  attitudes  early  in  the  curriculum:  curiosity  and 
experimental  disposition;  a  desire  to  structure  and 
communicate  quantitative  ideas;  an  appreciation  for 
mathematics  as  a  useful  tool  to  solve  real  problems;  and  an 
appreciation  for  the  power  of  deductive  reasoning. 


The  first  block  of  12-15  lessons  covers  matrix  algebra. 
The  major  topics  in  this  block  are  the  basic  concept  of 
linearity,  matrix  operations,  determinants,  inverses,  Markov 
processes,  and  linear  programming.  The  second  block  on 
difference  equations  covers  first  order  theory  and 
applications,  second  order  theory  and  applications,  first 
order  systems,  Markov  chains,  and  nonlinear  difference 
equations .  The  final  block  establishes  a  foundation  for 
calculus  by  introducing  sequences  and  difference  quotients. 

One  unifying  feature  of  all  the  blocks  in  this  course  is 
the  use  of  the  computer.  The  computer  will  be  used  to 
demonstrate  concepts  in  class  as  well  as  a  tool  to  solve 
problems.  USMA  and  the  Department  of  Mathematics,  in 
particular,  have  had  an  aggressive  program  of  computer 
assisted  instruction  for  several  years  [6].  This  course 
intends  to  establish  the  foundation  for  computer  use  by 
cadets  in  solving  problems  of  a  mathematical  or  scientific 
nature  and  to  establish  the  computer  as  a  tool  in 
mathematical  experimentation. 

3.  Lean  and  Lively  Calculus. 

3.1.  Background . 

From  the  1950 's  through  1974  the  core  mathematics 
program  at  USMA  was  a  strong  and  stable  program  in 
undergraduate  mathematics  both  in  content  and  credit.  In 
four  semesters  each  cadet  received  the  equivalent  of  six 
courses  in  mathematics  —  single-variable  (integral  and 
differential)  calculus,  multivariable  calculus,  linear 
algebra,  ordinary  differential  equations,  and  elementary 
probability  and  statistics.  Cadets  attended  class  six  days 
a  week  for  17  weeks  a  semester  at  80  minutes  per  day.  All 
of  the  textual  materials  were  written  at  USMA  either 
directly  or  under  the  supervision  of  the  Chairman  of  the 
Department  of  Mathematics,  Charles  P.  Nicholas. 

Since  the  mid-70's  there  has  been  a  constant  and 
steady  erosion  of  the  depth  and  breadth  of  coverage  in  the 
core  mathematics  at  USMA.  Part  of  this  was  the  result  of 
offering  academic  majors  in  non-science  and  non-engineering 
fields.  Regardless  of  the  rationale  for  the  reduced 
emphasis  on  mathematics,  the  effects  were  the  same.  By  the 
end  of  the  1980 's  the  core  mathematics  program  was  reduced 
by  30%.  Unlike  many  other  schools,  USMA  still  has 
maintained  an  emphasis  on  mathematics  by  keeping  four 
mathematics  courses  in  its  core  curriculum.  [7] 

The  resulting  programs  never  reached  a  steady- 
state.  Topics  would  appear,  disappear,  and  reappear  from 
semester  to  semester.  Conceptual  development  was  replaced 
entirely  by  the  learning  of  algorithmic  skills.  There  was 
no  real  plan. 
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In  1984  the  Chairman  of  the  Department  of 
Mathematics  received  a  report  from  the  tenured  faculty  which 
recognized  a  need  to  change  the  core  mathematics  program  at 
USMA  and  to  take  advantage  of  the  technological  advances  in 
computers  and  symbolic  manipulation.  Howver,  textbooks 
were  not  available  (essentially  are  still  not)  and  there  was 
no  authority  or  time  to  write  these  textual  materials  at 
USMA.  Therefore,  little  was  changed. 

However,  since  the  beginning  of  the  National 
Reform  Movement  in  Calculus  in  1987  there  has  been 
increasing  interest  in  mathematics  education  in  many  sectors 
[1],  [2]  —  publishers,  authors,  professors,  computer 
scientists,  and  students.  It  is  against  this  backdrop  that 
the  West  Point  version  of  the  "Lean  and  Lively"  Calculus  is 
being  developed. 

3.2.  Course  Description  -  Calculus  I  and  II. 

These  are  the  second  and  third  courses  of  the 
mathematics  core  curriculum  and  are  each  4 . 5  credit  hours . 
These  standard  courses  provide  study  of  mathematics  as  an 
intellectual  discipline  and  as  a  foundation  for  continued 
study  of  mathematics  and  for  the  subsequent  study  of 
physical  sciences,  social  sciences,  and  engineering. 
Beginning  with  functions  and  the  sequential  development  of 
the  limit,  the  calculus  is  covered  through  the  development 
and  evaluation  of  multiple  integrals.  No  vector  calculus  is 
included.  Ordinary  differential  equations  are  integrated 
into  the  course  as  soon  as  higher  order  derivatives  are 
covered.  Computers  and  symbolic  manipulation  are  integrated 
throughout  the  program  to  foster  both  discovery  and 
intellectual  curiosity  and  to  enhance  problem-solving. 

3.3.  Objectives  of  the  Calculus  Sequence. 

There  are  four  basic  objectives  to  the  study  of 
calculus  which  support  the  overall  objectives  of  the 
mathematics  curriculum  at  USMA: 

a.  Students  learn  the  three  basic  limit  ideas  of 
calculus:  The  limit  of  a  convergent  sequence  is  related  to 
the  concept  of  a  continuous  function;  the  limit  of  a 
quotient  is  related  to  a  derivative;  the  limit  of  a  sum  is 
related  to  the  definite  integral. 

b.  Students  be  able  to  prove  some  of  the  basic 
results  in  the  calculus . 

c .  Students  be  able  to  formulate  ideas  in  the 
mathematics  of  the  calculus. 
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d.  Students  be  able  to  solve  problems  using 
calculus  by  formulating  the  models  and  applying  the 
appropriate  techniques  and  algorithms. 

3.4.  A  "Lean"  Calculus. 

Two  problems  contribute  to  a  need  for  a  new  "lean" 
calculus.  Both  problems  lie  in  the  size  of  the  calculus 
text.  Calculus  books  are  too  large!  Even  though  there  are 
only  three  principle  ideas  there  are  typically  16-19 
chapters  of  material.  What  once  were  applications  or 
examples  have  been  elevated  to  the  status  of  independent 
topics.  Thus  problem  one  is  the  growth  of  "important" 
topics . 


The  second  problem  is  the  reluctance  to  remove 
outdated  or  irrelevant  material  from  the  textbooks .  Many 
topics  which  are  purely  algorithmic  by  nature  and  easily 
implemented  with  a  computer,  are  still  being  drilled  and 
memorized  in  calculus  classrooms. 

There  are  two  approaches  to  be  taken  in  deriving 
this  new  lean  calculus  —  the  butcher's  approach  or  that  of 
the  sculptor. 

The  butcher's  approach  is  relatively  easy  to 
implement  and  requires  no  new  textbooks.  Essentially  the 
topics  of  the  textbook  are  divided  between  baseline  and 
enhancement.  Every  student  of  calculus  does  the  baseline 
and  some  percentage  of  the  enhancement  depending  on 
background,  instructor  preference,  etc.  This  idea  has 
essentially  been  implemented  by  Scott  Foresman  Publishers 
for  the  Calculus  and  Analytic  Geometry  by  A1  Shenk.  On  the 
surface  this  approach  sounds  like  little  improvement.  Some 
agreement  across  colleges  over  what  is  baseline  and  what  is 
enhancement  would  be  required. 

There  are  however  Computer  Algebra  Systems  (CAS) 
that  can  support  a  butcher's  approach  independent  of  the 
choice  of  textbooks.  CAS  is  the  new  technology  that  would 
make  this  approach  a  major  improvement  over  the  existing 
programs .  CAS  performs  symbolic  manipulation  to  include 
symbolic  integration  and  differentiation  in  either  a  hand¬ 
held  calculator  or  computer  software. 

CAS  is  not  a  crutch  to  do  for  students  what  they 
should  be  able  to  do  for  themselves .  CAS  is  a  force 
multiplier  that  makes  for  a  more  efficient  use  of  study  time 
for  the  student  and  allows  professors  to  change  course 
priorities .  Much  of  the  time  that  is  spent  on  drill  and 
memorization  is  eliminated.  Topics  and  problems  that  were 
not  accessible  before  can  now  be  explored  using  CAS  and 
other  computer  support. 
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The  sculptor's  approach  to  a  "lean"  calculus  will 
be  a  work  of  art  and  is  probably  still  a  couple  of  years  in 
the  making.  Central  to  this  approach  are  new  textual 
materials  which  incorporate  several  major  differences  from 
the  same  "ole  brewski." 

Emphasis  must  shift  to  conceptual  understanding 
and  problem  solving  and  away  from  memorization  and  drilling 
on  formulas .  More  writing  requirements  and  interpretation 
of  results  should  be  emphasized  instead  of  the  production  of 
results.  Differential  equations  should  be  integrated 
throughout  the  calculus  textbook  instead  of  being  treated  in 
isolation.  Finally,  CAS  and  other  computer  capabilities 
should  be  integrated  into  the  text  to  capitalize  even  more 
on  the  new  technology. 

3.5.  A  "Lively " .Calculus . 

Many  ideas  for  implementing  a  "lean"  calculus 
exist.  What  seems  to  be  more  difficult  is  the  question  of 
how  to  "liven-up"  the  calculus  program.  We  look  to 
relevance  and  experimentation.  We  intend  to  emphasize  the 
relevance  of  calculus  to  the  solving  of  problems  — 
motivational  and  carry-over.  We  also  intend  to  emphasize 
experimentation  and  the  discovery  of  new  techniques  to  solve 
interesting  but  previously  unsolvable  problems. 

Several  special  problems  have  already  been 
developed  for  use  in  the  calculus  program  that  emphasize 
integration  and  modeling  and  solution  of  differential 
equations.  New  carry-over  problems  are  being  developed  in 
probability  and  statistics  as  well  as  optimization  and 
economics . 


Computers,  CAS,  and  specialty  software  will  play 
two  major  roles  in  the  lively  calculus.  The  use  of 
computational  software  opens  up  a  wider  variety  of  problems 
that  are  more  realistic  and  interesting  for  students  to 
solve.  The  student  is  also  much  more  inclined  to  explore 
the  nature  of  functions  and  discover  their  properties  with 
the  use  of  computers . 

Cadets  at  USMA  currently  own  The  Calculus  Toolkit , 
the  Midshipman's  Plotting  Package,  and  DERIVE,  CAS  for  IBM 
compatible  PC's. 
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4.  Probability  and  Statistics 


The  USMA  probability  and  statistics  course  is  the 
capstone  of  the  mathematics  required  of  all  cadets  and  is  3 
credit  hours .  In  negotiating  this  course  we  expect  students 
to  show  sophistication  and  technical  maturity.  When 
students  come  to  us  from  their  precollege  experience  their 
learning  is  essentially  skill  based;  and  the  pedagogy  and 
material  at  the  beginning  of  our  core  curriculum  reflect 
this  as  a  point  of  departure.  Our  curriculum  is  designed  to 
gradually  wean  students  from  this  learning  approach 
culminating  in  this  final  course,  probability  and 
statistics,  in  which  the  learning  is  wholly  cognitive;  very 
little  time  and  class  reward  is  devoted  to  skill  learning. 
The  issues  confronted  by  the  student  are  not  closed  form  and 
require  him  to  interpret  his  mathematical  manipulations. 

Because  the  course  is  conceptual  in  character,  we 
emphasize  the  unified  structure  of  the  study  of  uncertainty. 
Computation  is  pushed  off  to  software  (currently  MINITAB) . 
Learning  is  socratic  in  character;  students  are  directed  in 
such  a  way  that  they  "discover"  the  two  distributions  that 
form  the  center  of  the  course;  one  discrete  and  one 
continuous.  It  is  our  goal  that  students  internalize  the 
idea  that  once  an  issue  can  be  modeled  by  a  random  variable 
and  its  distribution,  one  has  a  complete  guage  of  the 
inherent  uncertainties.  While  only  two  distributions  are 
formally  developed  in  class,  students  are  expected  to  lift 
the  essence  of  a  distribution  to  other  functions;  to 
generalize  the  concepts  and  apply  them  to  problems  other 
than  the  two  ' learning  examples . ' 

Our  transition  to  statistics  appeals  to  the  intuitive 
notion  that  the  character  of  a  population  can  be  forecast 
from  a  suitable  subset  of  the  population.  The  notion  of 
"sample  space",  first  discussed  in  probability,  is  replaced 
with  the  space  of  all  subsets  of  a  fixed  size  ( "the  sample 
space  of  samples  of  size  n").  The  student  observes  that 
measures  taken  on  these  samples  meet  the  definition  of 
random  variables  on  this  new  sample  space.  At  this  point 
the  structure  of  the  course  quickly  narrows  his 
consideration  to  two  such  measures:  mean  for  central 
tendency  and  variance  for  spread.  This  approach  causes 
students  to  take  the  perspective  that  the  most  important 
need  for  interpreting  a  sample  outcome  is  to  characterize 
the  distribution  of  these  measures . 

This  perspective  leads  to  the  study  of  the  Central 
Limit  Theorem.  The  result  of  the  theorem  is  motivated 
experimentally;  first  by  mechanical  means  (e.g.  drawing 
numbered  slips  out  of  a  container)  and  then  through  computer 
simulation.  It  is  beyond  the  scope  of  the  course  to  provide 
an  analytic  proof  of  the  theorem,  but  students  have  a  strong 
intuitive  appreciation  of  the  result. 
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Establishing  the  distribution  of  the  variance  is  less 
elegant.  It  is  our  assessment  that  the  knowledge  required 
to  logically  develop  the  relationship  between  the 
distribution  of  the  variance  and  the  corresponding  Chi- 
squared  distribution  would  demand  an  effort  all  out  of 
proportion  to  the  gain  in  a  one  semester  course.  Thus  the 
transformation  of  variable  is  simply  given  as  an  analogue  of 
the  Z-transform  with  which  they  have  become  familiar. 

Central  to  the  course  is  a  case  study  that,  in  an  actual 
example,  reviews  and  reinforces  all  the  ideas  discussed.  In 
addition  to  accomplishing  the  technical  analysis  of  the 
issues,  students  are  required  to  interpret  their  "numbers." 
Furthermore,  the  minimum  course  standard  requires  that  their 
case  study  be  in  a  professionally  acceptable  format.  This 
includes  embedding  files  from  their  statistical  software 
into  their  word  processing  files  and  integrating 
mathematical  exhibits  with  text.  To  reinforce  the  Case 
Study's  importance,  the  grade  for  the  effort  is  one  third  of 
their  final  exam. 

A  remark  on  the  choice  of  case  study  is  in  order.  We 
have  found  the  learning  is  far  greater  if  the  topic  is  taken 
from  actual  student  experience,  something  that  effects  their 
lives.  For  example,  we  selected  as  a  population  the  grades 
of  a  preceding  class  and  asked  them  to  draw  conclusions 
about  the  types  of  career  success  these  students  enjoyed  and 
how  that  was  correlated  to  various  academy  successes: 
academic  grades,  military  leadership  grades,  and  physical 
fitness  grades.  This  case  study  was  far  more  successful  as 
a  learning  tool  than  an  earlier  one  that  investigated  a  very 
important  weapons  systems  (the  Bradely  fighting  vehicle)  but 
a  subject  that  was  only  vicarious  to  sophomore  level  cadet. 
We  concluded  that  having  as  an  object  of  study  something 
that  the  students  actually  experience  and  see  as  real 
imposes  and  sense  of  urgency  in  their  study;  students  want 
to  understand  those  issues  that  influence  their  lives  now. 

Completion  of  the  course  poises  students  to  address 
problems : 

-  That  require  interval  estimates  of  parameters . 

-  That  establish  rational  decision  values  for 

experimental  variables. 

-  That  require  simple  design  of  an  experiment. 

The  course  does  not  leave  them  as  skilled  statisticians. 
However,  it  does  blend  together  the  key  elements  of  all 
preceding  mathematics  courses .  It  prepares  them  to  use 
quantitative  methods  to  solve  significant  and  unstructured 
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problems  that  require  sophisticated  interpretation.  And  it 
prepares  them  to  communicate  their  findings  irr  a  clear  and 
professional  manner. 

5.  Research  Program. 

This  is  a  brief  description  of  the  faculty  and  student 
research  program  of  the  Department  of  Mathematics.  With 
regard  to  faculty  basic  and  applied  research,  the 
departmental  program  supports  the  philosophy  of  the  USMA 
Superintendent,  LTG  Palmer.  In  a  position  paper,  he  stated: 

"...The  faculty  at  USMA  constitutes  a 
valuable  resource  to  the  Army  [and  the 
nation  at  large]  in  that  nowhere  else  is 
there  as  large  a  concentration  of  highly 
educated  personnel  as  at  West  Point.  The 
potential  of  this  resource  to  solve  Army 
problems  should  be  fully  exploited  . . . 

"These  officers  will  take  this  valuable 
experience  back  to  the  Army  with  them,  and 
many  will  put  it  to  good  use  in  positions 
in  the  acquisition  system  such  as  project 
manager.  Thus  USMA  will  provide  the  Army 
with  officers  that  understand  the  reserch 
process  and  who  will  not  be  technically  at 
the  mercy  of  government  contractors  ..."  [8] 

There  is  no  question  that  the  Department  has  committed  its 
faculty  to  the  furtherance  of  knowledge  in  the  areas  of 
prime  concern  to  the  Army  and  nation.  Over  20  of  the  60 
officers  in  the  Department  were  directly  involved  in 
significant  research  or  consulting  projects  during  the  last 
academic  year.  These  projects  are  described  in  [9]  and 
include  applications  in  many  areas  of  mathematics  and 
science,  i.e.,  numerical  computing,  fluid  dynamics,  number 
theory,  underwater  and  atmospheric  acoustics,  probability 
distributions,  statistical  analysis,  time  series,  computer 
aided  design,  computer  aided  instruction,  air  defense 
methodology,  signal  processing,  financial  modeling,  and 
combat  modeling.  In  addition,  18  officers  spent  time  during 
the  summer  of  1989  at  an  Army  laboratory  or  government 
agency  performing  research  or  consulting.  Many  other 
instructors  were  involved  in  smaller  part-time  efforts.  The 
cenured  faculty  particularly  were  involved  in  this  effort 
through  consulting  with  Army  laboratories,  schools,  and 
agencies,  attending  conferences,  presenting  results,  and 
publishing  in  technical  journals. 
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Under  the  direction  of  MAJ  John  S.  Robertson,  the 
Department  of  Mathematics  research  program  was  particularly 
successful  in  1989.  Four  members  of  the  Department  served 
in  Dean's  Research  positions  and  were  funded  by  the  USMA 
Science  Research  Laboratory  with  money  from  the  Army 
Research  Office.  Several  other  researchers  received  this 
same  funding.  For  the  first  time  ever,  substantial  funding 
was  obtained  from  external  agencies  for  travel,  supplies, 
computer  hardware,  software,  and  library  services.  With 
this  funding  and  direction,  the  future  looks  bright  for  the 
Department's  research  program. 

One  enhancement  for  the  program  may  come  from  the 
Department's  use  of  Foundation  Schools  for  instructor 
education.  Starting  this  year,  all  the  non-tenured  faculty 
(85%  of  the  Department's  strength)  will  receive  their 
masters-level  education  at  one  of  three  schools,  Georgia 
Tech,  Rensselaer  Polytechnic  Institute,  or  the  Naval 
Postgraduate  School  with  degrees  in  either  applied 
mathematics  or  operations  research.  This  program  will 
enable  the  tenured  faculty  to  interface  with  the  officers  at 
an  earlier  stage  for  better  control  of  professional 
development  with  emphasis  on  finding  research  opportunities 
that  can  continue  while  the  officer  is  assigned  to  the 
Department . 

The  student-research  program  is  focused  in  two  areas: 
Volunteer  Slimmer  Training  (VST)  and  a  3-credit  Research 
Seminar  (MA  491) .  Over  the  last  two  years,  over  25  cadets 
have  participated  in  a  4-6  week  VST  research  program  at  many 
agencies  including  TRADOC  Analysis  Center-Monterey, 

Ballistic  Research  Laboratory,  Natick  Laboratory*  Concepts 
Analysis  Agency,  and  Los  Alamos  National  Laboratory. 

Several  cadets  have  completed  the  MA  491  course  through 
their  undergraduate  research  in  topics  such  as  numerical 
computing,  chaos  and  fractals,  combat  modeling,  and 
financial  modeling. 

As  we  head  into  the  1990 's,  research  has  taken  an 
important  place  in  the  Department  of  Mathematics.  Student 
and  faculty  involvement  in  research  activities  has  been 
beneficial  and  rewarding  and  most  likely  will  continue  to 
grow  in  the  future. 
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Modeling  and  Estimation  for  Multiresolution 
Stochastic  Processes 

A.S.  Willsky1 


1  Multiscale  Representations  and  Homogeneous 
Trees 

The  recently-introduced  theory  of  multiscale  representations  and  wavelet  transforms 
[4]  provides  a  sequence  of  approximations  of  signals  at  finer  and  finer  scales.  In  1-D  a 
signal  /(x)  is  represented  at  the  mth  scale  by  a  sequence  /(m,  n)  which  provides  the 
amplitudes  of  time-scaled  pulses  located  at  the  points  n2~m .  The  progression  from 
one  scale  to  the  next  thus  introduces  twice  as  many  points  and  indeed  provides  a 
tree  structure  with  the  pair  (2 ~m,n)  at  one  scale  associated  with  (2"^m+1^,2n)  and 
(2“(m+1),2n  +  1)  at  the  next.  This  provides  the  motivation  for  the  development 
of  a  system  and  stochastic  process  theory  when  the  index  set  is  taken  to  be  a 
homogeneous  dyadic  tree.  In  this  paper  we  outline  some  of  the  basic  ideas  behind 
our  work. 

Let  T  denote  the  index  set  of  the  tree  and  we  use  the  single  symbol  t  for  nodes  on 
the  tree.  The  scale  associated  with  t  is  denoted  by  m(t),  and  we  write  a  <  t  (a  -<  t) 
if  m(s)  <  m(t)(m(a)  <  m(t)).  We  also  let  d(a,t)  denote  the  distance  between  s  and 
t,  and  s  At  the  common  “parent”  node  of  s  and  t  (e.g.  (2 ~m,n)  is  the  parent  of 
(2~(m+1\2n)  and  (2-*m+1\2n  +  1).  In  analogy  with  the  shift  operator  z~l  used  as 
the  basis  for  describing  discrete-time  dynamics  we  also  define  several  shift  operators 
on  the  tree:  -0,  the  identity  operator  (no  move);  y~1,  the  fine-to-coarse  shift  (e.g. 
from  (2-(m+1),  2n  or  2n  +  1)  to  (2 ~m,n));  a,  the  left  coarse-to-fine  shift  ((2-m,  n)  to 
(2~(m+i)? 2n));  0,  the  right  coarse-to-fine  shift  ((2-m, n)  to  (2~^m+l\2n  + 1));  and  S, 
the  exchange  operator  ((2~(m+1),2n)  < — ►  (2_(m+1),2n  +  1)).  Note  that  0  and  8  are 

lThis  res«arch  was  supported  in  part  by  the  Army  Research  Office  under  grant  DAAL03-86-K- 
0171  (Center  for  Intelligent  Control  Systems),  AFOSR  grant  AFOSR-88-0032  and  the  NSF  under 
grant  ECS-8700903. 


isometries  in  that  they  axe  one-to-one,  onto  maps  of  T  that  preserve  distances. 
Also  we  have  the  relations 

82  —  7~la  =  7-1/?  =  0 , 7-1<5  =  7-1  ,  80  =  a  (1.1) 

It  is  possible  to  code  all  points  on  the  tree  via  shifts  from  an  arbitrary  origin  node, 
i.e.  as  1 vt0,  w  6  £,  where 

c  =  (y-‘y  u  {a,  mr'y  u  {«,  u-2) 

The  length  of  a  word  w  is  denoted  |u;|  and  equals  d(wt,t)  (e.g.  [7“ 1 1  =  1,  |£|  =  2). 
Also,  since  we  will  be  interested  in  coarse- to- fine  dynamic  models,  we  define  some 
notation  for  causal  moves: 

w  X  0  (w  ~<  0)  if  wt  ■<  t  ( wt  -<  t )  (1-3) 

2  Modeling  of  Isotropic  Processes  on  Trees 

A  zero-mean  process  Yt,  t  €  T  is  isotropic  if 

E[YtY.\  =  rJ(M|  (2.1) 

i.e.  if  its  second-order  statistics  are  invariant  under  any  isometry  of  T.  These  pro¬ 
cesses  have  been  the  subject  of  some  study,  and  a  Bochner-like  spectral  theorem 
has  been  developed  [1,2].  However,  many  questions  remain  including  an  explicit 
criterion  for  a  sequence  rn  to  be  the  covariance  of  such  a  process  and  the  repre¬ 
sentations  of  isotropic  processes  as  outputs  of  systems  driven  by  white  noise.  Note 
first  that  the  sequence  {lV»t}  is  an  ordinary  time  series  so  that  r„  must  be  positive 
semidefinite;  however,  the  constraints  of  isotropy  require  even  more.  To  uncover 
this  structure  we  have  developed  in  [2]  a  complete  characterization  of  the  class  of 
isotropic  autoregressive  (AR)  models  where  an  AR  model  of  order  p  has  the  form 

Yt  =  £  awYwt  +  aWt  (2.2) 

w-<0 

\™\<P 
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where  Wt  is  a  white  noise  with  unit  variance.  Note  that  this  model  is  “causal” — i.e. 
it  has  a  coarse- to-fine  direction  of  propagation — since  w  X  0.  Also,  a  first  thought 
might  be  to  examine  models  with  strict  past  dependence,  i.e.  Yt  a  function  of  «t; 
however  as  shown  in  [2],  the  constraints  of  isotropy  allow  us  to  show  that  only  AR(1) 
has  such  dependence.  Thus  we  have  that  AR(p)  involves  a  full  set  of  2p~l  aw’s  and 
one  a  so  that  the  number  of  parameters  doubles  as  p  increases  by  one.  In  addition  as 
shown  in  [2],  isotropy  places  numerous  polynomial  constraints  on  these  parameters. 
As  we  develop  in  [2]  a  better  representation  is  provided  by  the  generalization  of 
lattice  structures  which  involves  only  one  new  parameter  as  p  increases  by  one. 

Let  7i{ ■  •  •}  denote  the  Gaussian  linear  space  spanned  by  the  variables  in  braces 
and  define  the  (nth  order)  past  of  the  node  t: 

yt,n  =  H  {Ywt  :  w  X  o,  |tv|  <  n}  (2.3) 

As  for  time  series,  the  development  of  models  of  increasing  order  involves  recursions 
for  the  forward  and  backward  prediction  errors.  Specifically,  define  the  backward 
residual  space: 

yt,n  =  yt,n-l  ®  Ft,n  (2-4) 

where  Tt>n  is  spanned  by  the  backward  prediction  errors 

Ft, n(w)  =  Ywt  -  E(Ywt\yt,n.,)  (2.5) 

where  w  ■<  0,  |tu|  =  n.  These  variables  are  collected  into  a  2^J -dimensional  vector 
(see  [2]  for  the  order),  Ft,n •  For  |iyj  <  n  and  w  x  0  (i.e.  m(wt )  =  m(t))  define  the 

forward  prediction  errors: 

Et, »(«0  =  Ywt  -E(Ywt\y,-u, „_i)  (2.6) 

and  let  £t,n  denote  the  span  of  these  residuals  and  Et,„  the  -dimensional  vector 
of  these  variables  (see  [2]). 

The  key  to  the  development  of  our  models  is  the  recursive  computation  of  Ft<n 
and  Et,n  as  n  increases.  The  general  idea  is  the  same  as  for  time  series  but  we  must 
deal  with  the  more  complex  geometry  of  the  tree  and  the  changing  dimensions  of 
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Ft,n  and  Et,n-  In  particular,  as  shown  in  [2],  it  is  necessary  to  distinguish  between  n 
even  and  odd  and  between  different  groups  of  the  components  of  Ft<n  and  Etn.  For 
example,  Ft,n  consists  of  Ft,n(w )  in  eq.(2.5)  with  |u;|  =  n,  w  ;<  0.  Suppose  that  n  is 
even  and  consider  elements  of  Ft<n  for  which  |u>|  =  n,  w  -<  0.  In  this  case  w  —  w'y~1 
for  some  w  ■<  0,  with  |u)|  =  n  —  1,  and  by  an  argument  exactly  analogous  to  the 
time  series  case  we  obtain  the  recursion: 

Et,n{w)  =  F^-u,n-i(w)  -  E  [F^-it,n_i(u;)|£lt>n-1]  (2.7) 

This  procedure  identifies  several  projections,  as  in  eq.(2.7),  to  be  calculated.  A  key 
result  is  that  these  projection  operators  can  in  fact  be  reduced  to  scalar  projections 
involving  a  single  new  reflection  coefficient  and  the  local  averages  or  barycenters 
of  the  residuals: 


et,n 

=  2-[2ri] 

£  EM 

(2.8) 

ju/|<n,u>:«0 

kn 

= 

5Z  Ft,n(«0 

(2.9) 

|u>|=n,uH0 


For  example,  the  projection  in  eq.(2.7)  is  the  same  for  all  such  w  and  in  fact  equals 
E  [F7-it,n_i(uj)|eti„_i].  This  and  related  expressions  follow  from  the  properties  of 
isotropy  and  from  a  very  important  fact:  any  local  isometry,  i.e.  a  map  /  from 
one  subset  of  A  onto  another  that  preserves  distances,  cam  be  extended  to  a  full 
isometry  on  T. 

As  a  consequence  of  this  result,  we  can  obtain  scalar  Levinson  recursions  for 
the  barycenters  themselves  [2].  These  recursions  introduce  a  sequence  of  reflection 
coefficients,  kn,  and  lead  to  a  generalization  of  the  Schur  recursions  for  time  series. 
In  [2]  we  also  show  how  these  same  kn  can  be  used  to  construct  whitening  and 
modeling  filters  for  Yt  and  we  present  a  stability  result  analogous  to  the  time  series 
case.  In  this  case,  however,  the  condition  is  somewhat  more  complex:  for  n  odd  we 
have  the  same  condition  as  for  time  series,  namely  |fc„|  <  1;  for  n  even,  however,  we 
must  have  —  |  <  kn  <  1.  In  addition  we  demonstrate  in  [2]  that  the  class  of  AR(p) 
processes  axe  completely  equivalent  to  reflection  coefficient  sequences  with  kn  =  0, 
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n  >  p  and  we  show  that  these  processes  are  exactly  the  isotropic  processes  with 
impulse  responses  with  support  on  a  cylinder  of  radius  [|]  about  the  strict  past  j~n. 

3  State  Models  and  Multigrid  Estimation 

A  second  class  of  models  displaying  coarse- to- fine  structure  is  specified  by  state 
models  of  the  form 


x(t)  =  A(m(t))x(j  1t)  +  B(m(t))w(t)  (3.1) 

where  w(t)  is  a  vector  white  noise  process  with  covariance  I.  The  model  eq.(3.1) 
describes  a  process  that  is  Markov  scale-to-scale  and,  because  of  this,  we  can  readily 
calculate  its  second  order  statistics.  For  example  in  the  case  in  which  A  and  B  are 
constant  and  A  is  stable,  eq.(3.1)  can  describe  stationary  processes,  where  the 
covariance  of  x  satisfies  the  Lyapunov  equation 

Px  =  APxAt  +  BBt  (3.2) 

and  the  correlation  function  is 

Kxx(t,  3 )  =  A«t*M)Px(ATy‘it’tM)  (3.3) 

In  the  scalar  case,  or  if  APX  =  PXAT ,  eq.(3.1)  describes  an  isotropic  process,  but  in 
general  eq.(3.1)  describes  a  somewhat  larger  set  of  processes. 

Consider  now  the  estimation  of  x(f)  based  on  measurements 

y(t)  =  C(m(t))x(<)  -f  v(t )  (3.4) 

where  v(t )  is  white  noise  of  covariance  R(m(t)),  independent  of  x.  In  many  prob¬ 
lems  we  may  only  have  data  at  the  finest  level;  however  in  some  applications  such 
as  geophysical  signal  processing  or  the  fusion  of  multispectral  data,  data  at  multiple 
scales  is  collected  and  must  be  combined.  In  [3]  we  describe  three  different  algorith¬ 
mic  structures  for  estimating  x(t)  based  on  the  measurements  in  eq.(3.4).  One  of 
these  involves  processing  from  one  scale  to  the  next.  This  structure  resembles  the 
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Laplacian  pyramid  processing  structure  [4]  and  can  be  performed  extremely  quickly 
using  discrete  Haar  transforms. 

A  second  structure  is  based  on  the  following  equality  which  can  be  derived  from 
the  Markovian  structure  of  eq.(3.1): 

x(t)  =  L\x{i~xt)  +  L2(x(at)  +  x(/3t ))  +  L3y(t)  (3.5) 

where  £x,  L2,  and  Lj  are  gains  (depending  upon  scale  in  general).  Eq.(3.5)  describes 
a  set  of  coupled  equations  from  scale  to  scale  which  can  be  solved  by  Gauss-Seidel 
relaxation  that  can  be  structured  exactly  as  in  multigrid  algorithms  for  the  solution 
of  partial  differential  equations. 

A  third  algorithm  involves  a  single  fine-to-coarse  sweep  followed  by  a  coarse- 
to-fine  corrrection.  In  the  first  step  we  recursively  calculate  the  best  estimate  of 
x(t)  based  on  observations  in  its  descendent  subtree.  This  recursion  involves  three 
steps,  which  together  define  a  new  Riccati  equation:  a  backward  prediction 
step  to  predict  from  at  and  fit  to  f;  a  merge  step,  merging  these  two  estimates; 
and  an  update  step  incorporating  the  measurement  at  t.  The  merge  step  is  the 
new  feature  that  has  no  counterpart  for  standard  temporal  models.  Once  we  have 
reached  the  top  node  of  the  tree,  the  downward  sweep  has  the  same  form  as  the 
Rauch- Tung-Striebel  form  of  the  optimal  smoother  for  temporal  models  (allowing 
of  course  for  the  proliferation  of  parallel  calculations  as  the  algorithm  passes  from 
coarser  to  finer  scales):  the  best  smoothed  estimate  at  t  is  calculated  in  terms  of  the 
best  smoothed  estimate  at  7 ~lt  and  the  filtered  estimate  at  that  node  calculated 
during  the  upward  sweep. 
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ABSTRACT.  We  present  stochastic  model-based  methods  both  for  restoring  images 
corrupted  by  impulse  noise  and  for  detecting  edges  in  an  image  which  may  be  caused  either 
by  changes  in  intensity  or  texture. 

The  image  is  represented  by  a  nonsymmetric  half  plane  autoregressive  model  driven  by 
impulse  contaminated  Gaussian  noise.  This  type  of  noise  is  more  commonly  encountered  in 
real  images,  unlike  the  pure  Gaussian  noise  treated  in  earlier  papers.  We  develop  an  image 
restoration  algorithm  given  only  the  image  corrupted  by  additive  noise,  the  original  clean 
image  or  its  model  being  unknown.  We  show  that  this  method  gives  much  better  results  than 
currendy  available  methods  based  on  median  filters  or  alpha-trimmed  filters. 

Next  we  develop  methods  which  can  detect  both  intensity  edges  and  texture  edges.  It  is 
well  known  that  traditional  edge  detection  methods  have  difficulty  in  detecting  texture  boun¬ 
daries.  We  first  generate  edge  hypotheses.  We  use  two  different  procedures  for  confirming 
whether  it  is  an  intensity  edge  or  a  texture  edge.  We  give  several  examples  to  illustrate  the 
efficacy  of  the  proposed  approach. 

I,  INTRODUCTION  AND  OVERVIEW.  In  the  past  decade,  there  has  been  remarkable 
progress  in  the  research  on  statistical  image  models  and  their  applications.  Statistical  image 
models  (often  called  random  field  models  or  spatial  interaction  models)  represent  the  image 
intensity  of  a  given  picture  by  a  small  number  of  parameters.  There  are  many  applications  of 
image  models  in  image  processing  and  analysis.  For  instance,  they  can  be  used  for  image 
synthesis  (Kashyap,  1984;  Cross  and  Jain,  1983),  image  restoration  (Chellappa  and  Kashyap, 
1982;  Gcman  and  Geman,  1984),  image  coding  (Delp  et  al.,  1979),  texture  boundary  detec¬ 
tion  (Kashyap  and  Eom,  1985a),  and  texture  analysis  (Kashyap  and  Khotanzad,  1984). 

For  the  application  of  image  models  to  such  image  processing  tasks,  we  need  to  esti¬ 
mate  the  parameters  in  the  image  models.  There  are  many  different  estimation  algorithms 
for  different  image  models,  but  most  of  these  methods  are  based  on  the  assumption  of  Gaus¬ 
sian  image  intensity  distribution.  However,  the  actual  distribution  of  image  intensity 
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deviates  from  the  Gaussian  assumption,  and  traditional  estimation  methods  are  very  sensitive 
to  minor  deviations  from  the  Gaussian  assumption.  During  the  past  few  decades,  many  esti¬ 
mators  which  are  robust  to  the  deviations  from  the  Gaussian  assumption  have  been  proposed 
(Huber,  1981),  but  they  are  rarely  applied  to  image  modelling. 

Robust  estimation  procedures  for  several  different  image  models  are  developed  and 
applied  to  some  important  image  processing  problems  such  as  image  segmentation  and 
image  restoration  in  this  study. 

A.  Robust  Statistical  Procedures 

There  has  been  considerable  interest  in  robust  methods  in  statistics  in  recent  years.  This 
is  because  most  statistical  inference  methods  are  based  on  rather  restrictive  assumptions 
about  the  observations  and  models,  such  as  independence  of  observations,  distribution  of 
observations,  etc.  However,  these  assumptions  do  not  always  hold,  and  many  statistical  pro¬ 
cedures  are  very  sensitive  to  minor  deviations  from  the  given  assumptions.  For  example,  it 
is  well  known  that  least  squares  methods  are  excessively  sensitive  to  a  small  number  of 
outliers. 

The  term  robust  was  introduced  by  G.E.P.  Box  in  1953,  and  a  procedure  is  called  robust 
if  it  is  reasonably  good  (optimal  or  near  optimal)  if  the  assumption  holds,  and  it  is  not  sensi¬ 
tive  to  small  deviations  from  the  assumption.  Primarily  robustness  implies  distribution 
robustness,  i.e.,  the  robustness  about  the  small  deviations  from  the  assumed  distribution  (usu¬ 
ally  Gaussian).  The  resistance  to  outliers  is  considered  equivalent  to  the  distribution  robust¬ 
ness  (Huber,  1981). 

There  are  several  types  of  robust  procedures:  M-estimators,  L-estimators,  and  R- 
estimators.  Among  these,  M-estimators  have  an  advantage  over  other  procedures  because 
they  can  be  extended  to  the  parameter  estimation  problems  in  image  models.  In  contrast, 
either  L-estimators  or  R-estimators  are  difficult  to  generalize  well  beyond  one  parameter 
location  or  scale  problems.  The  robust  M-estimators  are  applied  to  the  parameter  estimation 
problem  of  causal  autoregressive  models.  Two  different  outlier  processes  are  considered, 
and  iterative  robust  estimation  algorithms  for  both  of  the  outlier  processes  are  developed. 
Theoretical  properties  of  the  proposed  robust  estimators  are  investigated. 

B.  Image  Models 

Image  models  characterize  the  image  intensity  surface  with  a  small  number  of  parame¬ 
ters.  Image  models  can  be  divided  into  two  groups,  namely,  descriptive  and  generative 
models.  A  descriptive  model  for  an  image  summarizes  the  intensity  distribution  into  a  finite 
number  of  statistics.  An  example  is  the  cooccurrence  matrix  (Haralick,  1973)  used  in  texture 
analysis.  The  generative  model,  on  the  other  hand,  allows  one  to  synthesize  an  image  obey¬ 
ing  the  given  model  by  using  the  model  description  and  a  set  of  random  numbers.  We  will 
restrict  ourselves  to  generative  models  since  they  can  be  used  for  many  varieties  of 
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applications. 

We  can  further  divide  the  generative  models  into  two  large  classes.  In  the  first  class, 
the  observed  intensity  function  y(i,j)  is  assumed  to  be  the  sum  of  a  deterministic  function  - 
usually  polynomial  or  sinusoid  -  and  an  additive  noise.  In  the  second  class,  the  image  inten¬ 
sity  function  is  generated  as  the  output  of  a  transfer  function  whose  input  is  a  sequence  of 
independent  random  variables.  The  transfer  function  represents  the  known  structural  infor¬ 
mation  on  the  image  surface;  the  independent  random  sequence  accounts  for  the  unknown 
part.  Note  that  the  neighboring  pixels  are  highly  correlated,  unlike  in  the  earlier  case,  and 
the  transfer  function  accounts  for  the  covariance. 

C.  Applications 

Image  restoration  and  image  segmentation  are  two  important  branches  of  image  pro¬ 
cessing.  Image  restoration  is  needed  to  recover  the  original  image  from  the  image  corrupted 
by  noise  (including  impulse  noise),  and  image  segmentation  procedure,  especially  edge 
detection  or  boundary  detection,  is  involved  in  most  high  level  image  processing  problems. 
Robust  image  models  are  developed  and  applied  to  the  above  image  processing  problems  in 
this  study. 


1.  Image  Restoration 

An  image  may  be  subject  to  noise  and  interference  from  many  different  sources,  and 
image  restoration  is  used  to  remove  noise  from  the  given  image.  Traditionally,  noise  distri¬ 
bution  is  assumed  as  a  Gaussian  distribution,  and  many  different  restoration  algorithms  based 
on  Gaussian  assumption  have  been  introduced  (Pratt,  1978;  Rosenfeld  and  Kak,  1982). 

Recently,  image  models  have  been  used  in  image  restoration  applications.  For  exam¬ 
ple,  Chellappa  and  Kashyap  (1982)  used  a  simultaneous  autoregressive  model  and  condi¬ 
tional  Markov  model,  Wu  (1985)  used  a  nonsymmetric  half  plane  autoregressive  model  and 
two-dimensional  Kalman  filtering  approach,  and  Geman  and  Geman  (1984)  used  a  family  of 
Markov  models.  Even  though  the  above  examples  show  some  successful  applications  of 
image  models  in  the  image  restoration  problem,  all  of  the  above  methods  are  designed  to 
remove  Gaussian  noise,  and  are  not  very  effective  to  remove  impulse  noise  (Pratt,  1978). 

Traditionally,  median  filter  and  its  generalizations  (Kassam  and  Poor,  1985)  are  used  to 
remove  impulse  noise  (also  called  salt-and-pepper  noise)  from  the  noisy  image.  These 
methods  are  simple  applications  of  robust  location  parameter  estimators,  such  as  median  or 
a-trimmed  mean,  where  image  intensity  is  assumed  constant  over  a  small  size  window. 
However,  the  restored  images  by  these  methods  are  blurred  (Pratt,  1978). 

Robust  image  model  approaches  are  applied  to  the  image  restoration  problem  in  our 
study.  The  original  image  intensity  is  assumed  to  follow  an  image  model,  and  parameters 
are  estimated  by  a  robust  estimation  algorithm.  The  image  is  restored  by  applying  a  data 
cleaning  algorithm  with  the  robustly  estimated  parameters.  The  robust  model-based  method 
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performs  better  than  any  other  traditional  method  in  the  experiment. 

2.  Edge  and  Segment  Boundary  Detection 

Edge  detection  or  boundary  detection  is  a  fundamental  step  in  scene  analysis.  Tradi¬ 
tionally,  an  edge  is  defined  as  a  boundary  between  two  uniform  regions,  where  the  intensity 
of  each  region  is  uniform  and  the  intensity  difference  between  two  regions  is  large.  Most 
edge  detection  algorithms  are  based  on  the  gradient  operator  or  the  Laplacian  operator 
(Robinson,  1977),  which  is  sensitive  to  a  change  of  intensity.  Recently,  some  model-based 
edge  detection  approaches  are  proposed  (Haralick,  1984;  Zhou  and  Chellappa,  1986),  but 
they  are  also  based  on  the  derivatives  methods  using  decision  rules  with  estimated  model 
parameters. 

For  the  higher  level  processing,  the  edges  should  be  able  to  distinguish  the  shape  of 
each  object  from  the  background  of  an  i  nage.  However,  intensity  edges  are  sometimes  not 
satisfactory  to  represent  an  object  and  distinguish  it  from  the  background,  because  the  inten¬ 
sity  of  an  object  or  a  background  is  not  uniform.  For  instance,  a  grass  lawn  in  an  outdoor 
scene  is  homogeneous  by  its  texture  property,  but  it  has  many  intensity  edges  within  the 
region.  The  above  example  suggests  the  necessity  of  detecting  boundaries  (or  edges)  by  its 
texture  property. 

Image  models  are  already  used  in  synthesizing  textures  which  are  very  similar  to  real 
textures,  and  the  estimated  parameters  which  are  obtained  by  fitting  an  image  model  to  the 
given  image  can  be  used  as  texture  features.  The  texture  features  derived  from  image  model 
or  from  other  methods  can  be  used  to  segment  an  image  by  a  statistical  classification  method, 
if  the  number  and  types  of  textures  in  the  given  image  are  known  in  advance.  However,  the 
above  prior  information  is  generally  not  available. 

A  composite  edge  detection  algorithm  is  developed  in  this  study.  The  composite  edge 
detection  algorithm  combines  the  model-based  texture  boundary  detection  method  and  a  con¬ 
ventional  intensity  edge  detection  method.  This  algorithm  detects  all  potential  edges  by  a 
directional  derivatives  method,  and  final  edges  are  confirmed  whether  they  are  texture  edges 
or  intensity  edges.  This  algorithm  is  also  compared  with  other  conventional  edge  detection 
methods  in  the  experiment.  The  composite  edge  detection  algorithm  performs  better  than 
other  conventional  methods  which  detect  only  intensity  edges  in  the  experiment. 

II.  AR  AND  ARMA  MODELS. 


A.  Introduction 

It  is  claimed  traditionally  that  a  complete  stochastic  description  of  an  MxM  array  of 
pixel  intensities  y(s)  is  given  by  the  joint  probability  density  of  the  M2  intensity  variables 
y(*).  Even  writing  down  the  expression  is  horrendous  considering  that  the  typical  value  of  M 
is  128  or  256  or  512.  As  a  consequence,  it  was  often  conjectured  that  probabilistic  models 
may  not  be  of  much  use  in  solving  interesting  problems  in  image  processing.  The  purpose  of 
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this  paper  is  to  draw  attention  to  the  existence  of  a  large  class  of  image  models  which  can  be 
characterized  completely  in  terms  of  the  second  order  properties  of  the  image  sequence,  i.e., 
the  correlations  E[y(s)y(s+r)]  or  the  corresponding  spectral  density.  Consequently  these 
models  are  relatively  easy  to  analyze.  It  must  be  emphasized  that  the  joint  probability  den¬ 
sity  of  all  the  intensities  is  not  assumed  to  be  Gaussian. 

In  the  beginning,  we  will  focus  our  attention  on  the  two-dimensional  generalization  of 
the  autoregressive  (AR)  models  and  autoregressive  moving  average  (ARMA)  models  popu¬ 
lar  in  the  time  series  analysis.  Basically  all  these  two  dimensional  models  can  handle 
rational  spectral  densities,  i.e.,  the  ratio  of  two  linear  combinations  of  sinusoids  in  the  two 
frequency  variables  in  the  direction,  just  as  in  the  one  dimensional  case.  However,  there  are 
many  differences  between  the  ID  and  2D  cases  which  will  be  highlighted  in  this  section.  For 
example,  in  the  ID  case,  the  correlation  function  is  an  exponentially  decaying  function  of  the 
lag  variable.  But  in  the  2D  case,  one  rarely  encounters  the  exponential  correlation  function. 
Similarly  in  the  ID  case,  the  driving  input  random  sequence  is  both  statistically  independent 
and  uncorrelated  with  the  dependent  variables  in  the  past.  In  the  general  2D  case,  the  input 
sequence  cannot  possess  both  these  properties  simultaneously. 

Secondly,  we  will  consider  the  various  possible  ways  of  defining  the  weak  Markov  pro¬ 
perty  in  the  2D  case.  By  weak,  we  mean  that  the  corresponding  Markov  property  can  be 
described  completely  in  terms  of  the  second  order  properties  like  correlation  or  spectral  den¬ 
sity.  The  traditional  Markov  property  defined  in  terms  of  the  probability  densities  is  termed 
as  the  strong  Markov  property.  A  sequence  cannot  be  strong  Markov  without  being  weak 
Markov.  We  will  characterize  the  various  subclasses  of  2D  AR  and  ARMA  models  which 
possess  various  types  of  weak  Markov  property. 

We  recall  that  the  general  AR  or  ARMA  models  mentioned  above  are  not  recursive,  in 
general.  Still  these  models  are  generative  in  principle,  i.e.,  it  is  possible  to  give  an  algorithm 
which  generates  a  sequence  which  obeys  a  prespecified  model.  However,  the  amount  of 
computation  involved  may  be  considerable.  We  will  consider  modifications  or  approxima¬ 
tions  of  the  AR  or  ARMA  models  so  that  it  is  relatively  easy  to  synthesize  an  image  obeying 
a  given  model. 

Preliminaries: 

We  will  consider  a  covariance  stationary  array  of  the  real  numbers  { y(i,j).  -°°,  <  i,j  < 
°°}>  ij  being  integers.  i,jjc  stand  for  integers.  s,t,r  stand  for  two  dimensional  vectors  specify¬ 
ing  the  grid  points.  Often  we  are  given  a  finite  MxM  image  (y(i,j),  (ij)  e  £2},  Q  =  {(i,j):  0  < 
i,j  5  M-l }.  y(s)  is  the  intensity  at  the  grid  point  s.  Typically  if  s  =  (i,j),  i  stands  for  the  row 
number,  numbered  increasingly  from  top  to  bottom,  and  j  is  the  column  number,  numbered 
from  left  to  right.  The  corresponding  vector  of  real  frequencies  is  denoted  by 
X  =  (2.1, A.2),  being  the  row  frequency,  and  \2  being  the  column  frequency.  Similarly  z\ 
and  Z2  are  the  unit  lead  operators  in  the  row  and  column  directions,  respectively. 
Specifically,  Zjy(i,j)  =  y(i+l,j),  z2y(i,j)  =  y(ij+l).  We  will  also  interpret  zj  as  complex 
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variables  by  the  relation  z\  =  exp[V-T^],  i  =  1,2.  X,z,r,s,t,  etc.,  will  be  considered  to  be 
row  vectors.  The  vectors  composed  of  image  intensities  y(*)  or  the  input  primitive  random 
variables  v(*),  w(*)  will  usually  be  column  vectors.  An  image  is  said  to  have  a  trend  if 
E[y(ij)3  is  a  deterministic  function  of  i  and  j.  An  image  is  said  to  be  covariance  stationary  if 
the  covariance  function  defined  below  is  a  function  of  i  and  j  alone  and  not  a  function  of  s, 
and  hence  is  denoted  by  R(i,j) 

E[(y(s)  -  y)(y(s  +  (i,j))  -  y))]  =  R(i,j) 


where  y  =  E[y(s)]. 

A  covariance  stationary  random  field  in  which  y  is  a  constant  is  called  as  weak  stationary.  A 
random  field  (y(s)}  is  said  to  be  isotropic  if  R(i,j)  =  R(|i|,lj|)  =  R(j.i)-  For  a  covariance  sta¬ 
tionary  RF,  we  can  define  a  spectral  density. 

S(X)  =  2  R(s)exp[>£T  s  •  a.] 

sel2 


oo  or  —m 

=  Z  S  R(si,S2)exp[V-l  (s^i+si^)] 


Another  important  second  order  measure  of  an  RF  model  is  the  variogram 
Vi(s)  =  E[(y(s)  -  y(s+r))2]  =  function  of  r  only  if  y(*)  is  weak  stationary.  The  covariance 
function  R(-)  can  be  recovered  from  S(X.)  by  the  usual  Fourier  integral 

R(r)  =  J  S(X)exp[V-T  X  •  r]  |  dX  | 

\  =  (kl,\2),  r  =  (i,j)  | dX |  =  | dX,  |  | dX2 1 


Another  important  concept  is  the  neighbor  set.  A  neighbor  set  is  a  set  of  grid  points 
whose  coordinates  are  near  0,  but  0  itself  is  not  a  member  of  a  neighbor  set  N  is  said  to  be 
symmetric  if  r  €  N-=**-r  €  N. 

Popular  neighbor  sets  are  the  ones  having  4  nearest  neighbors  and  8  nearest  neighbors. 

X  XXX 

x  •  X  X  •  X 

X  XXX 


4  neighbor  N 


8  neighbor  N 


A  neighbor  set  N  is  said  to  be  semicausal  if  the  row  coordinates  (or  column)  of  all 
members  are  the  same  sign.  Some  examples  of  the  semicausal  neighbor  sets  are  given 
below. 


X  X  X  X  X 

X  X  •  X  X 


X  X  •  X  X 

X  X  X  X  X 


X  X 

X 

X  X 


X  X 

•  X 

X  X 


where  •  stands  for  origin. 

B.  2D  AR  Processes 

Consider  a  real  valued  stationary  process  possessing  a  spectral  density  of  the  form 


S(X)  =  l/[positive  linear  combination  of  sinusoids  in  ,X2] 


Our  first  step  is  to  enquire  whether  y(*)  can  be  expressed  as  the  output  of  a  system  character¬ 
ized  by  a  two  dimensional  rational  transfer  function  of  finite  order,  the  input  being  some  ele¬ 
mentary  stochastic  process,  say  v(*).  Toward  this  end,  consider  the  system  described  by  the 
difference  equation  where  v(«)  is  the  elementary  input 

y(s)  =  £  0ry(s+r)  +  Vp  v(s),  9r  =  0_r,  (1) 

reN 

where  N,  a  so-called  neighbor  set  is  a  set  of  grid  points  possessing  symmetry,  i.e.,  if  s  e  N, 
then  -s  e  N.  All  No  neighbor  sets  can  have  the  origin  0  for  its  member.  However,  not  all 
neighbor  sets  may  be  symmetric.  Define  the  two  dimensional  polynomial  A(zi,z2)  in  terms 
of  the  coefficients  0r 

A(zi,z2)  =  A(z)=  1  -  2Z  ®i,jzizi 

»  j 

(i.j)eN 

The  coefficients  (0r)  in  (1)  obey  the  following  condition  defined  in  terms  of  the  polynomial 
A: 

A(z1,z2)>0  V  |zi  |  =  1  and  |z2|  =  1 .  (2) 

In  addition,  the  input  v(’)  in  (1)  is  assumed  to  have  zero  mean  and  be  orthogonal  to  all  y('), 
ie.. 
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E[v(s)y(s+r)]  =  0  V  r  *  0,  (3) 

We  also  assume  E[v2(s)]  =  1.  The  parameter  p  in  (1)  can  specify  the  relative  power  of  the 
input  term. 

We  can  also  rewrite  Eq.  (1)  compactly  in  terms  of  the  polynomial  A: 

A(z)y(s)  =  Vp  v(s) .  (4) 

In  defining  (4),  z;  are  interpreted  as  the  unit  lead  operators  in  the  two  directions. 

Equation  (3)  defines  the  process  v(*)  only  indirectly.  The  precise  structure  of  the  pro¬ 
cess  v(*)  is  not  obvious.  We  will  derive  later  an  expression  for  the  spectral  density  of  v(*) 
using  (l)-(3). 

Equation  (3)  can  be  thought  of  as  defining  a  v(*)  process  given  a  y(-)  process.  It  is  not 
obvious  here  how  to  generate  a  y(*)  and  a  v(*)  sequence  obeying  simultaneously  (l)-(3).  We 
will  later  show  constructively  that  there  do  exist  infinite  sequences  y(-)  and  v(-)  obeying  (1)- 
(3). 

Structure  of  v(-)  process 

The  following  theorem  gives  the  spectral  densities  of  the  processes  y(-)  and  v(-)  which 
obey  (l)-(3). 


Theorem  1:  The  spectral  density  of  y  and  v  obeying  (l)-(3)  are  given  below: 


Syy(K)~  Aj(X)  ’ 

(5) 

SVV(X)  =  A1(X)  , 

(6) 

where  Aj  (X,)  =  A(zi  ,Z2>,  zj  =  expfV-T  Xjj. 

Proof: 

We  will  obtain  a  difference  equation  for  the  covariance  function  of  y.  Note  E(y(’))  =  0. 
Let  R(t)  =  E[y(s)y(s+t)].  Multiply  (1)  by  v(s),  take  expectation  on  both  sides,  and  use  (3). 


E[y(s)v(s)]  =  Vp"  E[v2(s)] 

=  Vp.  (7) 


Next  multiply  (1)  by  y(s+t)  on  both  sides  and  take  expectation 
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R(t)  =  2  0rR(r~t)+  Vp  E(v(s)y(s+t)) 

reN 


=  2  0rR(r-t)  +  Vp"  5t,o  ,  (8) 

reN 


by  using  (3)  and  (7),  where 


5t,o  =  1  if  t  =  0 
=  0  otherwise 


Take  Fourier  transform  of  (8) 

(1-2  0rexp[^/T  XT])Syy(X)  =  p  , 

reN 


i.e.,  or  Syy(X)  = 


P 

Ai(X)‘ 


To  prove  (6),  take  spectral  density  of  both  sides  of  (4). 

p  Sw(X)  =  ||A(z!=exp (VTXO,  z2=exp(^Tx2))||2Syy(X) 

=  ||A1(X)||2Syy(X), 

Using  (5)  for  Syy(X),  the  above  equation  yields  the  required  expression  for  SVV(X)  in  (6). 

The  proof  is  given  in  some  detail  because  it  gives  the  difference  equation  for  Ry(t).  In 
addition,  the  above  proof  indicates  the  existence  of  a  process  y(*)  obeying  (l)-(3)  by  demon¬ 
strating  its  spectral  density. 

The  v(*)  process  is  an  analog  of  a  one-dimensional  moving  average  process.  Its  covari¬ 
ance  function  is 


E[v(s)v(s+r)]  =-0r  if  re  N 
=  1  if  r  =  0  - 

=  0  .elsewhere 


(9) 


However,  one  important  distinction  between  ID  and  2D  cases  lies  in  the  fact  that  it  cannot 
have  a  2D  version  of  moving  average  representation,  i.e.,  it  cannot  be  represented  as  a  finite 
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linear  combination  of  independent  random  variables.  The  reason  is  that  the  symmetric  poly¬ 
nomial  A(zj  ,Z2)  cannot  be  factored,  i.e.,  it  cannot  be  expressed,  in  general,  as  a  product  of  2 
finite  polynomials. 

Converse  of  Theorem  I: 

This  section  started  with  the  assumption  (3)  on  v(*).  What  would  be  the  structure  of  the 
process  y(*)  if  v(*)  is  assumed  to  be  white?  We  will  prove  the  converse  of  Theorem  1  and 
show  that  a  process  with  inverse  sinusoidal  spectral  density  does  not  in  general  have  any 
representation  other  than  (1).  The  exceptions  will  be  handled  later. 

Theorem  2:  Consider  a  zero  mean  stationary  process  y(*)  having  a  spectral  density  as  shown 
below 


Syy(A.)  =  p/[a  positive  linear  combination  of  sinusoids  in  Xi 
i.e.,  Syy(A.)  =  p/A(z1,z2),  z;  =  exp(^/-Txi),  (10) 

and  A(zi,z2)=  1  -  X  ^ 

reN 

where  N  is  symmetric,  0r=0_r  and  A(’)  obeys  (3).  Then  define  v(-)  as: 

v(s)4(y(s)  -  X  y(s+r))/Vp, 

reN 

Then 


E[v(s)y(s+r)]  =  0,  H  r  *  0. 


Proof:  By  definition 


v(s)  =  A(z)y(s)/Vp" 

Multiply  both  sides  by  y(s+t)  and  take  expectation 

Rvy(-t)  =  A(z)Ryy(t)/Vp 
Take  Fourier  transform  of  both  sides 

Svy(X)4A(z,,z2)Syy(X)/Vp,  z,  =  ex  pfV^TXi) 
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=  ^p"  by  (10), 

Hence  E[v(s)y(s+r)]  =0  if  r  *  0. 

Expression  for  the  Correlation 

In  the  one  dimensional  case,  the  correlation  function  is  a  linear  combination  of  the 
exponentially  decaying  function  of  the  lag  term  given  that  the  spectral  density  is  a  ratio  of 
linear  combinations  of  sinusoids.  Such  a  result  is  not  true  in  the  2D  case.  Exponential  corre¬ 
lation  functions  are  rare.  We  can  evaluate  the  correlations  from  the  spectral  density  by 
numerical  integration.  We  will  give  one  example  below. 

Example:  Consider  the  4  member  symmetric  neighbor  set. 

Let  y(s)  =  0  £ y(s+r)  +  Vpv(s) 

reN 


N  =  [(i,j),  |i|  =  1  or  1  j |  =  1,  not  both] 
The  spectral  density  is 


S(X)  = 


_ P _ 

1  -20(cosX  i  +cosX2 ) 


Here  y(*)  is  isotropic. 

For  discussion  of  other  models,  see  (Kashyap,  Eom,  1988). 

ID.  ROBUST  ESTIMATION  IN  CAUSAL  AUTOREGRESSIVE  MODELS. 


A.  Introduction 

The  importance  of  model- based  techniques  for  image  processing  tasks  such  as  edge 
detection,  image  synthesis,  image  coding,  image  restoration,  etc.,  has  been  well  documented. 
However,  in  all  of  these  models,  the  image  intensity  array  is  assumed  to  be  a  multivariate 
Gaussian  distribution.  The  Gaussian  assumption  is  used  primarily  in  estimating  the  parame¬ 
ters  of  the  image  model  fitted  to  the  image.  The  corresponding  estimation  procedure  is  rela¬ 
tively  easy;  for  example,  for  the  causal  autoregressive  model,  the  maximum  likelihood 
method  is  the  same  as  the  least  squares  method.  However  in  many  applications,  it  is  well 
known  that  the  Gaussian  assumption  is  not  appropriate. 

A  more  realistic  assumption  is  a  contaminated  Gaussian  noise, 
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(11) 


CO  J)  = 


w(i,j), 

v(i,j). 


with  probability  l-(3 
with  probability  [} 


where  w(i,j)  is  a  regular  white  Gaussian  noise  and  v(i,j)  is  an  outlier  process  and  the  ratio  of 
outlier  |3  is  assumed  small  (less  than  5%). 

Unfortunately,  least  squares  estimators  or  maximum  likelihood  estimators  under  the 
Gaussian  assumption  are  very  sensitive  to  minor  deviations  from  the  Gaussian  noise  assump¬ 
tion.  Even  a  single  bad  data  (outlier)  among  1000  observations  can  cause  a  large  error  in  the 
estimator.  Because  of  this  excessive  sensitivity  of  least  squares  estimators,  a  robust  estima¬ 
tor  is  needed  in  image  models.  A  robust  estimator  should  possess  the  following  properties: 

(1)  It  should  have  a  reasonably  good  (optimal  or  nearly  optimal)  efficiency  at  the  assumed 
noise  distribution. 

(2)  It  should  be  robust  in  the  sense  that  a  small  number  of  outliers  impair  the  performance 
only  slightly. 

(3)  Somewhat  larger  deviations  from  the  assumed  distribution  should  not  cause  a  catas¬ 
trophe. 

The  resistance  to  outliers  (e.g.,  impulse  noise)  is  equivalent  to  the  distribution  robust¬ 
ness  by  Hampel’s  theorem  (Huber,  1981).  Many  different  robust  estimation  algorithms  have 
been  developed  in  the  last  twenty  years,  mostly  on  the  location  parameter  estimation.  These 
robust  estimation  algorithms  can  be  classified  into  three  large  types  of  estimators:  M- 
estimator,  L-estimator,  and  R-estimator.  M-estimator  is  a  maximum  likelihood  type  estima¬ 
tor  and  it  is  obtained  by  solving  a  minimization  problem.  L-estimator  is  a  linear  combination 
of  ordered  statistics.  R-estimator  is  derived  from  the  rank  tests.  We  are  mostly  interested  in 
M-estimator  for  the  application  on  the  image  models.  M-estimator  is  easy  to  extend  to  the 
problems  of  image  models,  but  other  types  of  estimators  are  difficult  to  use  in  problems  other 
than  simple  location  parameter  estimation. 

M-estimator  is  defined  by  the  following  minimization  problem: 


Minimize  £  p(Xj;9) 


(12) 


or  solve  the  following  implicit  function: 


£y(xi;9)  =  0  (13) 

where  p  is  a  continuous  and  differentiable  convex  function  possessing  bounded  and  continu¬ 
ous  derivative  \|/(x)=  and  p  is  symmetric  about  the  origin  with  p(0)=0.  The  convexity 

OX 

of  the  p  function  ensures  the  equivalence  of  (12)  and  (13).  The  boundedness  and  continuity 
of  the  \j/  function  is  essential  in  obtaining  robustness  of  the  M-estimator.  If  \jr  is  not 
bounded,  then  a  single  gross  outlier  can  completely  upset  the  estimator.  If  y  is  not 
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continuous,  then  small  changes  in  the  observation  x*  may  produce  a  large  change  in  the  esti¬ 
mator. 

There  are  several  different  definitions  of  robustness  of  an  estimator  (Huber,  1981). 
Qualitative  robustness  is  defined  by  weak  continuity  of  the  estimator.  M-estimator  is  quali¬ 
tatively  robust  if  and  only  if  the  corresponding  y  is  bounded  and  continuous.  Minimax 
robust  estimator  minimizes  the  maximum  degradation  over  £  deviations.  The  M-estimator  of 
location  is  optimal  in  the  sense  of  minimax  robustness.  Quantitative  robustness  is  defined  by 
the  property  of  small  change  in  asymptotic  bias  and  asymptotic  variance  in  the  contaminated 
neighborhood. 

Even  though  a  robust  procedure  is  necessary  in  most  image  processing  applications, 
very  little  research  has  been  done  on  the  use  of  a  robust  procedure  in  image  processing.  In 
this  section,  we  develop  estimation  algorithms  for  the  causal  autoregressive  image  model. 


B.  Causal  Autoregressive  Model 

It  is  well  known  that  a  large  class  of  images  can  be  effectively  represented  by  various 
types  of  image  models  involving  a  small  number  of  parameters  (Kashyap,  1981).  Image 
models  are  already  used  in  image  coding  (Delp  et  al.,  1979),  image  synthesis,  texture 
analysis,  and  edge  detection  (Kashyap  and  Eom,  1985a).  Of  course,  there  are  many  different 
types  of  image  models  and  these  can  be  classified  into  two  large  classes  of  image  models  by 
their  second  order  statistical  structures:  classical  short  correlation  models  and  long  correla¬ 
tion  models.  These  different  image  models  and  their  general  properties  are  discussed  by 
Kashyap  (1981). 

The  causal  autoregressive  model  is  a  generalization  of  the  one  dimensional  autoregres¬ 
sive  model.  This  model  is  simple  but  has  good  modelling  performance  as  shown  in  previous 
studies.  Consider  the  following  mxn  image  (Figure  1). 
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Figure  1.  An  mxn  image  and  three  causal  neighbors 
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Assume  that  the  image  intensity  in  this  image  follows  the  three  neighbor  causal  autore¬ 
gressive  model.  Let  (i,j)  be  an  index  for  the  coordinate  location  and  y(i,j)  be  the  intensity  at 
the  coordinate  (i,j).  Then  the  causal  three  neighbors  of  this  pixel  are 
{ y(i— 1 , j), y(i» j — 1). y(i —  l,j — 1)}.  This  causality  is  from  the  convention  of  raster  scanning,  and 
because  of  the  causality,  the  resulting  two  dimensional  model  has  all  the  convenience  of  the 
one  dimensional  model. 

Suppose  that  (£(i,j)}  is  a  two  dimensional  white  noise  sequence  with  outliers  as 
assumed  in  (11).  The  variance  of  the  regular  part  of  noise  is  a2.  Then  the  three  neighbor 
causal  autoregressive  model  is  represented  by  the  following  equation: 

y(i.j)  =  9Tz(i,j)  +  C(i»j)  (14) 

where  0  is  a  parameter  vector  and  z(i,j)  is  a  vector  consisting  of  intensities  of  three  causal 
neighbors  and  unity.  The  last  element  of  the  vector  z(i,j)  is  used  to  represent  the  constant 
grey  level  in  the  image. 


z(i.j)  = 


y(iJ-D 

y(i-i.j) 

y(i-lj-l) 

1 


(15) 


It  is  assumed  that  every  pixel  has  all  of  its  neighbors,  i.e.,  for  each  pixel  at  (i,j),  pixels  at  (i  j- 
1),  (i-1  j)  and  (i-lj-l)  are  available. 

We  consider  the  robust  parameter  estimation  of  the  causal  autoregressive  model  for  two 
cases  of  outliers.  First  case,  we  assume  that  the  process  y(i,j)  given  in  (14)  can  be  perfectly 
observed.  In  this  case,  the  oudier  process  is  involved  only  in  the  noise  process  £(i,j)  to  gen¬ 
erate  y(i,j).  Second  case,  we  assume  that  the  observation  x(i,j)  of  the  process  y(i,j)  is  cor¬ 
rupted  by  noise  ^(i,j).  It  is  given  by  the  following  equation: 

x(i,j)  =  y(i,j)  +  £(i,j) .  (16) 

The  noise  process  is  assumed  to  contain  outliers.  In  this  case,  the  outliers  are  not  only 
involved  in  generating  y(i,j)  but  are  also  involved  in  observation.  In  the  next  section,  robust 
parameter  estimation  will  be  discussed  for  these  two  different  cases  of  outliers. 

C.  Robust  Parameter  Estimation  with  Perfect  Observations 

The  parameters  of  the  image  model  given  in  (14)  can  be  estimated  by  robust  M- 
estimator.  The  M-estimator  of  the  parameters  in  (14)  is  a  generalization  of  location  M- 
estimator.  Define  the  following  function  Q(0,o). 
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Q(0,a)=  —  X  P  y-(l,j)  0Tz(l’j)  +1  a 
mn"  a  2 


where  p  is  a  continuous,  differentiable  and  convex  function  possessing  a  bounded  derivative, 
and  it  is  symmetric  about  the  origin  with  p(0)  =  0.  Then  M-estimator  of  the  causal  autore¬ 
gressive  model  is  defined  by  the  following  minimization  problem: 

Minimize  Q(9,a) .  (18) 

The  M-estimator  can  also  be  obtained  by  solving  the  following  two  equations  simultane¬ 
ously. 

*  " 

VeQ(0,a)  =  —  2>  -y(l,i)~9Tz(l’i)  zT(i,j)  =  0  (19) 

mn  a 


M®^  =  i--Lvx  y(j.j)-?T^.j)  =0  (20) 

da  2  mn  a 

where  \|/(x)=-P^  and  X(x)  =  x\y(x)-p(x),  function  x\f  is  continuous  and  bounded. 

OX 

The  following  p,  \\f,  and  X  functions  satisfy  the  above  conditions  on  these  functions.  In 
this  section,  it  is  assumed  that  the  following  functions  are  used  in  our  robust  estimation  algo¬ 
rithm. 


p(x)  = 


1  2 
2X’ 


I  X  I  SC 


C|x|-yC2,  |  X  |  >  C 


C,  X  >  c 

\|r(x)  =  =  -  x,  -c  S  x  S  c 

dx 

-c,  x  <  -c 


X(x)  =  x\|/(x)  -  p(x)  =  — [\(/(x)]2 
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Asymptotic  Property 

The  asymptotic  property  of  the  robust  M-estimator  for  autoregression  is  investigated  by 
Nasburg  and  Kashyap  (1975).  The  asymptotic  property  of  one  dimensional  autoregression  is 
also  applicable  to  two  dimensional  causal  autoregressive  model.  First  the  following  condi¬ 
tions  are  assumed: 

(i)  {y(ij)J  is  a  weakly  stationary  random  sequence. 

(ii)  \| r(.)  is  an  odd,  monotone  increasing  function  satisfying  a  Lipshitz  condition. 

(iii)  The  noise  process  £  has  finite  moments  up  to  third  order. 

(iv)  E[y(£(i,j)+c)]  =  y(c)  for  all  c. 

A 

Now  define  0n  as  an  M-estimator  which  satisfies  (18)  and  is  computed  with  sample  size 
N.  The  following  Theorem  6  and  Theorem  7  are  from  Nasburg  and  Kashyap  (1975). 

Theorem  6  (consistency):  Under  the  above  assumptions, 

§N  — »  0  as  N  — >  w.p.  1 


Theorem  7  (Asymptotic  Normality):  Under  the  above  assumptions,  V^n"  -  0)  con- 

Vi 

verges  in  distribution  to  a  normal  distribution  with  zero  mean  and  variance  — — ,  where 

Vi 


V,  = 


1 


1-6' 


-E[^(C(i,j))] 


and 


V2  = 


1-0' 


■E[V(C(i*j)>] 


Choice  of  vy  function 

A  good  choice  of  \\f  function  is  not  only  important  for  the  robustness  of  the  estimator 
but  is  also  important  for  the  fast  convergence  of  tne  iterative  procedure.  The  theoretical 
results  in  Section  UI.C  are  developed  with  the  following  monotone  y  function  \j/hl- 


VhlM  = 


c,  x  >c 

-c,  x  <  — c 


Typical  values  for  c  are  between  1.5  and  2. 


(24) 
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0(x) 


Figure  2.  Hard  limiter  type  \\f  function 

Even  before  the  theoretical  work  on  robust  estimation,  the  3-a  edit  rule  was  used  for 
data  cleaning  for  many  years.  The  3-a  rule  i'  a  simple  implementation  of  hard  rejection  rule 
and  corresponds  to  the  following  choice  of  \jr  function. 


Figure  3.  \|/  function  for  3-a  rule 

The  above  \j/  function  is  obviously  not  continuous.  The  discontinuity  of  the  function 
is  not  desirable  for  robust  estimation  as  discussed  before. 
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Another  interesting  y-function  is  the  following  Hampel’s  y-function.  The  Hampel’s 
function  is  also  continuous  but  returns  to  zero  outside  of  some  interval.  It  is  known  that  the 
redescending  y  function  yields  higher  efficiencies  than  monotone  y  function  for  extremely 
heavy  tailed  distributions  (Huber,  1981;  Rey,  1983).  This  advantage  of  the  redescending  y 
function  is  also  confirmed  in  our  experiment:  The  procedure  converges  much  faster  with 
Hampel’s  redescending  y  function  (26)  than  with  Huber’s  monotone  y  function  (24).  This 
function  performed  best  with  parameters  a=2,  b=2.5,  c=4.5  in  our  experiment. 


r 

Vha(x)  =  • 


X, 

1  x  |  <  a 

a 

(b-x), 

a  <  x  s  b 

(b-a) 

a 

(b+x), 

-b  <  x  <  -; 

(b-a) 

0, 

lx |  >  b 

(26) 


rp{x) 


Figure  4.  Hampel’s  y  function 


These  three  different  y  functions  are  compared  in  the  experiment,  and  the  best  perform¬ 
ing  function  is  chosen  in  our  algorithm.  The  Hampel’s  function  performed  better  than  other 
functions  in  our  experiment  with  the  parameter  values  given  above. 


IV.  IMAGE  RESTORATION  WITH  ROBUST  IMAGE  MODELLING  TECHNIQUES. 
A.  Introduction 

Restoration  of  an  image  in  the  presence  of  noise  is  one  of  the  fundamental  problems  in 
image  processing.  Let  x(i,j)  be  the  observed  image  intensity  of  the  original  (uncorrupted) 
image  intensity  y(i,j)  at  the  location  (i,j)  and  is  assumed  corrupted  by  additive  white  noise 
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X(i,j)  =  y(i,j)  +  C(i.j) 


(27) 


To  restore  image  intensity  (y(i,j) }  from  the  observation  (x(i,j)},  we  generally  make  assump¬ 
tions  on  the  noise  process  (£(i,j)}  and  the  original  image  intensity  y(i,j).  A  common 
assumption  on  the  noise  process  is  that  the  noise  distribution  is  Gaussian.  However,  the 
assumption  of  Gaussian  noise  has  been  seriously  questioned,  as  we  discussed  in  the  previous 
section.  A  more  realistic  assumption  is  that  the  noise  is  a  mixture  of  Gaussian  and  impulse 
noise. 


Ui  il  =  Jw(i’j)’  with  probability  1-p 
|v(i,j),  with  probability  (3 

where  w(i,j)  is  regular  Gaussian  noise  and  v(i,j)  is  an  outlier,  (3  is  the  fraction  of  outliers  and 
it  is  usually  less  than  5%. 

There  are  many  image  restoration  methods  based  on  the  Gaussian  noise  assumption. 
Chellappa  and  Kashyap  (1982)  used  a  spatial  interaction  model  to  represent  image  intensity 
array  and  restored  images  with  minimum  mean  square  error  criterion.  Geman  and  Geman 
(1984)  used  the  equivalence  of  Markov  random  field  and  Gibbs  distribution  and  restored 
images  by  a  stochastic  relaxation  method  with  maximum  aposteriori  criterion.  Bovick  et  al. 
(1985)  used  an  order  constrained  least  squares  method.  Wu  (1985)  used  a  multidimensional 
Kalman  filtering  approach  and  nonsymmetric  half  plane  autoregressive  model.  Chan  and 
Lim  (1985)  used  a  cascade  of  four  ID  adaptive  filters  in  four  different  directions. 

Unfortunately,  most  image  restoration  methods  based  on  the  Gaussian  noise  assumption 
are  not  effective  for  impulse  noise  (Rosenfeld  and  Kak,  1982).  The  impulsive  component  of 
the  noise,  which  is  also  called  as  salt-and-pepper  noise,  is  only  a  small  portion  (usually  less 
than  5%)  of  the  total  image  but  difficult  to  remove  by  the  methods  based  on  the  Gaussian 
noise  assumption,  because  its  amplitude  is  much  higher  than  the  signal  amplitude.  The 
importance  of  this  problem  has  been  recognized  for  a  long  period  of  time.  Traditionally, 
nonlinear  filtering  methods  such  as  median  filter  (Pratt,  1978)  or  a- trimmed  mean  filter 
(Bovick,  et  al.,  1983)  are  used  to  remove  impulse  noise  from  the  image.  These  methods  use 
a  sliding  window  and  the  grey  level  of  the  center  pixel  of  the  window  is  estimated  by  the 
median  or  a-trimmed  mean  of  the  samples  in  the  window.  The  grey  level  of  the  center  pixel 
is  replaced  by  this  estimate. 

These  traditional  nonlinear  filtering  methods  such  as  median  filter  or  a-trimmed  mean 
filters  are  based  on  the  robust  location  estimator  which  uses  a  linear  combination  of  ordered 
statistics  (robust  L-estimator)  (Huber,  1981).  These  methods  based  on  the  ordered  statistics 
are  used  in  robust  estimation  of  the  location  parameter  from  the  18th  century  (Rey,  1983). 
The  median  or  generalized  median  (linear  combination  of  ordered  statistics)  are  resistant  to 
the  contamination  of  outliers.  However,  it  is  based  on  the  assumption  of  constant  grey  level 
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in  the  window  applied  to  the  image.  Obviously,  this  constant  intensity  assumption  is  inaccu¬ 
rate.  The  image  intensity  in  a  window  is  continuously  changing,  especially  near  the  edges  or 
comers.  Because  of  this  constant  grey  level  assumption,  the  methods  based  on  the  linear 
combination  of  ordered  statistics,  such  as  median  filter  or  a-trimmed  mean  filter,  have  the 
disadvantage  of  blurry  results.  The  blurring  effect  is  more  severe  on  the  a-trimmed  mean 
filter  than  median  filter,  because  of  its  averaging  effect  even  if  the  mean  square  error  of  the 
a-trimmed  mean  filter  is  smaller  than  that  of  median  filter.  Median  filter  generally  does 
better  in  preserving  edges  and  comers,  but  the  well  known  examples  (Pratt,  1978)  show  that 
it  also  blurs  the  image. 

There  are  two  difficulties  in  solving  the  blurring  problem  in  the  traditional  methods 
such  as  median  filter  or  a-trimmed  mean  filter.  First,  the  intensity  function  in  the  window 
applied  to  the  image  is  unknown  and  difficult  to  be  represented  by  a  simple  function. 
Second,  the  linear  combination  or  ordered  statistics  method  used  in  traditional  methods  have 
difficulty  in  accommodating  the  effect  of  changing  intensity.  Even  though  there  has  been  a 
facet  model-based  approach  (Yasuoka  and  Haralick,  1983)  to  reduce  the  blurring  effect  after 
removing  impulse  noise,  it  is  based  on  the  least  squares  estimator  which  is  not  robust  to 
impulse  noise.  We  propose  a  restoration  method  which  uses  a  statistical  image  model  for  the 
representation  of  changing  intensity  and  which  uses  a  type  of  robust  method,  the  so-called 
M-estimator. 

We  can  use  one  of  the  image  models  mentioned  earlier  to  represent  intensity  change  in 
a  window  of  the  original  image.  The  parameters  of  the  image  model  can  be  estimated  by 
robust  M-estimator  as  shown  in  Section  III.  The  robust  M-estimator  of  the  causal  autore¬ 
gressive  model  can  be  obtained  by  the  iterative  algorithm  given  in  Section  in.  This  estima¬ 
tion  algorithm  includes  a  data  cleaning  procedure  at  each  iteration,  and  it  reduces  the  outliers 
in  the  observed  data.  The  convergence  property  of  the  robust  parameter  estimation  algo¬ 
rithm  is  also  discussed  in  Section  III.  The  image  data  become  noise  free  as  the  number  of 
iterations  increases,  because  the  parameter  estimates  converge  as  the  number  of  iterations 
increases  by  the  convergence  of  M-estimator  of  the  causal  autoregressive  model.  By  this 
data  cleaning  procedure,  we  can  obtain  the  image  from  which  most  of  the  impulse  noise  has 
been  removed,  and  the  original  sharpness  of  the  edges  is  preserved.  The  iterative  data  clean¬ 
ing  procedure  converges  relatively  fast  in  our  experiment.  In  most  of  our  experiments,  the 
data  cleaning  procedure  converges  only  after  three  iterations  with  almost  noise  free  results. 
The  restoration  algorithm  based  on  the  i  L'ust  estimation  algorithm  has  many  advantages 
over  the  traditional  methods  such  as  median  filter  or  a-trimmed  mean  filter.  The  comparison 
with  other  methods  will  be  discussed  later. 


B.  Intensity  Representation  for  Restoration 


The  objective  of  the  restoration  problem  is  to  estimate  the  original  image  intensity  y(i,j) 
from  the  given  sequence  of  x(i,j).  We  will  fit  a  causal  autoregressive  model  for  the  original 
(noise  free)  image  y(*). 

Let  (i  j)  be  an  index  for  the  coordinate  location  and  y(i,j)  be  the  intensity  at  the  location 
(ij).  Then  the  three  neighbor  causal  autoregressive  model  is  represented  by  the  following 
equation: 


y(i,j)  =  eTz(i,j)  +  C(iJ)  (29) 

where  0  is  a  parameter  vector,  (£(i,j)}  is  a  two  dimensional  white  noise  sequence  with 
outliers  as  in  (28),  and  z(i,j)  is  a  vector  consisting  of  intensities  of  three  causal  neighbors  and 
unity.  The  last  element  of  the  vector  z(i,j)  is  used  to  represent  constant  grey  level  in  the 
image. 


z(i,j)  = 


yOJ-i) 

y(i-i.j) 

y(i-l,j-l) 

1 


(30) 


It  is  assumed  that  every  pixel  has  all  of  its  neighbors,  i.e.,  for  each  pixel  at  (i,j),  pixels  at  (i  j- 
1),  (i-1  j)  and  (i-1  j-l)  are  available. 

We  assume  that  the  observation  x(i,j)  of  the  process  y(i,j)  is  corrupted  by  noise  ^(i,j). 
It  is  given  by  the  following  equation: 


x(i,j)  =  y(i»j)  +  £(Uj)  • 

The  noise  process  ^  is  assumed  to  contain  outliers. 


(31) 


C.  Image  Restoration  Algorithm 

The  purpose  of  image  restoration  is  to  remove  noise,  including  impulse  noise,  from  the 
image.  The  image  degradation  process  can  be  represented  by  the  following  equation: 

x(i,j)  =  y(i,j)  +  £(iJ) 

where  x  is  the  observation,  y  is  the  original  image  intensity,  and  t,  is  the  noise  process  with 
outlier.  Image  restoration  involves  estimation  of  the  original  intensity  y  from  the  observa¬ 
tion  x.  For  a  small  sized  image,  original  image  intensity  can  be  modelled  by  a  causal  autore¬ 
gressive  model.  If  the  original  image  intensity  indeed  obeys  a  causal  autoregressive  model, 
then  the  original  image  intensity  can  be  recovered  by  the  robust  estimation  algorithm  for  the 
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noisy  observation  case  (Eom,  Kashyap,  1988).  The  data  cleaning  procedure  removes  outliers 
at  each  iteration  without  degrading  the  original  signal. 

The  restoration  method  based  on  the  robust  image  model  has  an  advantage  over  con¬ 
ventional  methods  such  as  median  filter  or  a-trimmed  mean  filter.  The  robust  image  model- 
based  method  does  not  blur  images  after  restoration.  Conventional  methods,  such  as  median 
filter  or  a-trimmed  mean  filter,  replace  every  pixel  by  its  location  estimates.  Because  these 
methods  are  based  on  the  constant  intensity  assumption,  the  details  of  the  original  image  are 
significantly  blurred. 

This  procedure  at  each  iteration  is  described  in  the  following  block  diagram  (Figure  5) 
and  the  algorithm  is  also  summarized  below. 


Image  Restoration  Algorithm 


1.  Divide  the  image  into  small  sized  (8x8)  windows.  The  following  procedures  in  steps 
2-6  are  applied  for  each  window. 

2.  Let  (x(i,j))  represent  the  given  noisy  data  in  the  window  and  {y(k)(i,j)}  represent  the 
cleaned  data  at  the  k-th  iteration.  Initially,  y(0)(i,j)  =  x(i, j)  for  all  (i,j).  Compute  initial 
estimators  B(0)  and  c^0)  by  the  least  squares  method. 

0(0)  =  [^z(0)(i,j)z(0)T(i,j)r1[£z(0)(i,j)y(0)(i,j)l  (32) 

i.j  ».j 


and 


<j(0)2  =  J_£[y<°>(i,j)  _  0(O)Tz(O)(iJ)]2  (33) 

mn.j 

where  m  and  n  are  row  and  column  dimensions  of  the  image  and  z(k)(i,j)  is  the  follow¬ 
ing  state  vector. 


z°°(i,j)  = 


x(k)(i,j  1) 
x(k)(i-l,j) 
x^ti-l.j-l) 
1 


(34) 


00 

3.  Consider  k-th  iteration,  k>0.  Compute  residuals  rw(i,j)  and  modified  residuals  ?  (i.j) 
by  the  following  formula  with  the  estimated  parameters  computed  in  step  2  for  all  pix¬ 
els  in  the  window. 


r*k)(i,j)  =  y(k)(i,j)  -  iP?  z(k)(i,j) 


(35) 
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where  y  is  a  bounded  and  continuous  functions  as  discussed  in  Section  in  (e.g., 
Hampel’s  redescending  y-function). 

4.  Restore  image  by  the  following  rule  (data  cleaning) 

y(k+1)(ij)  _  0(k)Tz(k)(i  j)  +  f(k)(ij)  (37) 

5.  Update  estimators  of  parameter  0  and  scale  parameter  o2  by  the  following  formula. 


e<k+i>  =0<k)  +  [22(k>(i,j)z(k>T(i,j)r1[£z(k)(i,j)f(k)(i,j)] 


(38) 


»o 


«.j 


and 


o**'1’  =  — S[fW(i.j)]2 


mn  : 


(39) 


6.  Repeat  steps  3-5  until  the  difference  between  estimates  in  successive  iterations  becomes 
small. 


The  properties  of  the  algorithm  are  discussed  in  (Eom  and  Kashyap,  1988). 


Iterate 

I - *** - 1 


Figure  5.  Block  diagram  of  image  restoration  method  at  each  iteration.  ^  and  y(k+1) 
are  cleaned  data  at  k-th  and  (k+l)-th  iterations,  respectively,  0  and  are 
parameter  estimates  obtained  by  algorithm  1,  and  r(k)  is  the  residual. 


D.  Experimental  Results 


The  restoration  algorithm  based  on  the  robust  modelling  approach  is  applied  to  five  dif¬ 
ferent  pictures  as  shown  in  Figure  6.  Figure  6.a  is  a  256x256  picture  of  a  bridge.  Figure  6.b 
is  a  256x256  picture  of  the  face  of  a  monkey.  Figure  6.c  is  a  256x256  picture  of  a  girl.  Fig¬ 
ure  6.d  is  a  256x256  picture  of  an  outdoor  scene.  Figure  6.e  is  a  512x512  aerial  picture  of 
Purdue  University  campus.  All  of  these  pictures  are  digitized  into  256  grey  levels.  To  meas¬ 
ure  the  performance  of  different  algorithms  on  the  noisy  pictures,  contaminated  images  are 
constructed  by  adding  both  Gaussian  (0,100)  noise  and  5%  of  impulse  noise  to  the  originals 
given  in  Figure  6.  The  generated  impulse  noise  has  only  2  grey  levels,  0  (black)  and  255 
(white),  both  with  the  same  probability.  In  the  robust  model-based  algorithm,  Hampel’s  y- 
function  is  used  in  all  experiments.  Experiments  are  designed  to  clarify  three  different 
aspects  of  the  restoration  process.  First,  the  convergence  of  the  restoration  algorithm  is 
shown  with  these  noisy  pictures  and  the  rate  of  convergence  is  measured  experimentally. 
Second,  the  mean  square  error  of  three  different  restoration  algorithms,  namely,  model-based 
algorithm,  median  filter,  and  a-trimmed  mean  filter,  are  compared  for  different  window  sizes 
and  different  images.  Third,  the  overall  performance  of  three  different  restoration  algorithms 
are  compared  qualitatively  for  different  noisy  images. 

Convergence  of  Image  Restoration  Algorithm 

The  robust  model-based  restoration  algorithm  is  applied  to  the  contaminated  images. 
Mean  square  error  of  the  cleaned  image  is  computed  at  each  iteration. 

Figures  7.a,  7.b,  and  7.c  are  plots  of  mean  square  errors  versus  the  number  of  iterations 
for  the  outdoor  scene  (Figure  6.d),  the  girl’s  image  (Figure  6.c)  and  the  bridge  scene  (Figure 
6.a),  respectively.  Contaminated  pictures  are  made  by  adding  Gaussian  (0,100)  noise  and 
5%  of  impulse  noise  to  the  images  in  Figure  6.  Initial  mean  square  errors  in  all  cases  are  very 
large  because  of  the  additive  noise,  but  they  decrease  considerably  fast  in  the  first  two  itera¬ 
tions.  The  mean  square  error  stabilizes  in  less  than  three  iterations.  The  convergence  of  the 
data  cleaning  method  is  also  fast  (less  than  three  iterations). 

Mean  Square  Error  Comparison  of  Image  Restoration  Methods 

Four  different  types  of  image  restoration  methods  with  different  sizes  of  windows,  3x3, 
5x5,  and  7x7,  are  used  in  this  experiment.  These  are  the  mean  filter,  median  filter,  a- 
trimmed  mean  filter  with  the  trimming  ratio  a=0.15,  and  the  robust  model-based  method. 
Note  that  the  popular  choice  of  a  is  in  the  range  from  0.1  to  0.15,  and  the  method  performed 
best  with  choice  ct=0.15  in  our  experiment.  In  the  case  of  robust  model-based  method,  the 
fixed  window  size  of  8x8  is  used.  The  choice  of  8x8  is  from  convenience  and  a  small  change 
of  window  size  would  not  adversely  affect  the  performance,  because  the  fitted  image  model 
will  not  change  significantly. 

Four  contaminated  images  are  obtained  from  the  originals  in  Figure  6  by  the  same  pro¬ 
cedure  explained  in  the  above  section.  Different  restoration  methods  which  we  discussed  in 
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the  above  are  applied  to  these  contaminated  images,  and  mean  square  error  of  restored 
images  are  computed.  In  the  case  of  median  filter,  mean  filter,  and  a-trimmed  mean  filter, 
the  mean  square  error  is  computed  for  different  window  sizes,  but  in  the  case  of  robust 
model-based  method,  the  plotted  mean  square  error  is  for  the  fixed  window  size  8x8.  The 
computed  mean  square  error  is  plotted  with  respect  to  window  size. 

Figures  8.a,  8.b,  8.c,  8.d  are  plots  of  mean  square  error  computed  by  different  methods 
for  the  originals  of  the  outdoor  scene  (Figure  6.d),  girl’s  images  (Figures  6.c),  bridge  scene 
(Figure  6.a),  and  aerial  picture  of  Purdue  University  campus  (Figure  6.e),  respectively.  The 
results  are  consistent  for  all  different  types  of  images.  All  traditional  methods  result  in  rela¬ 
tively  large  values  of  mean  square  error  on  most  of  images,  especially  on  the  images  having 
many  edges.  For  example,  in  the  outdoor  scene,  minimum  values  of  mean  square  error  of 
mean  filter,  median  filter,  and  a-trimmed  mean  filter  are  690.1441,  651.1638,  and  220.2222, 
respectively.  In  contrast,  the  mean  square  error  of  the  robust  model-based  method  is 
103.9669.  The  difference,  which  is  significant,  corresponds  to  the  fact  that  the  intensity  in  a 
window  cannot  be  approximated  by  a  constant  because  of  the  edges  and  comers.  Traditional 
methods  have  small  values  of  mean  square  error  at  window  sizes  3x3  or  5x5  depending  on 
the  types  of  images.  Mean  filter  performs  worst  on  all  images  tested,  as  expected,  and 
median  filter  has  slightly  lower  mean  square  error  than  that  of  mean  filter,  a-trimmed  mean 
filter  performs  better  than  median  filter  or  mean  filter  but  its  mean  square  error  is  always 
larger  than  that  of  robust  model-based  method  on  all  images  tested.  The  mean  square  error 
comparison  shows  that  the  robust  model-based  method  performs  better  than  any  other  con¬ 
ventional  methods  on  tested  images.  The  minimum  values  of  mean  square  error  in  conven¬ 
tional  methods  are  220.2222  for  outdoor  scene,  80.6720  for  girl,  92.1115  for  bridge,  and 
253.7658  for  Purdue  campus,  respectively.  Mean  square  errors  of  our  approach  are  103.9669 
for  outdoor  scene,  52.5648  for  girl,  47.3367  for  bridge,  and  189. 1443  for  Purdue  campus. 
The  level  of  mean  square  error  of  conventional  methods  are  always  higher  than  that  of  robust 
model-based  method.  The  detailed  comparison  is  summarized  in  Table  III. 

Table  III.  Mean  square  error  comparison  of  different  restoration  methods  on  four 
different  types  of  images. 


Image 

MSE  of 

robust  model  method 

MSE  of 

mean  filter 

MSE  of 

median  filter 

MSE  of 

a-TM  filter 

Outdoor 

103.9669 

690.1441 

651.1638 

220.2222 

Girl 

52.5648 

318.9122 

300.3172 

80.6720 

Bridge 

47.3367 

264.6290 

216.3370 

92.1115 

Campus 

189.1433 

453.9291 

401.6255 

253.7658 
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Qualitative  Comparison  of  Image  Restoration  Methods 

The  noisy  images  and  images  restored  by  different  restoration  algorithms  are  shown  in 
Figures  9-10.  Figures  9-10  are  results  on  the  originals  of  Figure  6.a-b  in  the  same  order.  The 
upper  left  comer  of  the  each  picture  of  Figures  9-10  is  the  noisy  picture  contaminated  by 
noise  and  is  generated  by  adding  white  Gaussian  (0,100)  noise  and  5%  of  impulse  noise  to 
the  original.  This  image  shows  a  typical  salt-and-pepper  noise  pattern  as  well  as  Gaussian 
noise  degradation.  This  noisy  picture  is  used  to  obtain  restored  images  by  different  methods. 

The  upper  right  comer  of  each  picture  in  Figures  9-10  is  the  restored  image  by  robust 
model-based  method.  This  image  is  obtained  after  three  iterations  of  data  cleaning  process. 
The  impulsive  noise  is  almost  completely  absent  and  residual  Gaussian  noise  is  hardly 
noticeable.  The  fine  details  of  the  restored  image  are  well  preserved.  As  a  matter  of  fact, 
almost  all  details  of  the  original  in  Figure  6  are  still  well  shown  in  this  picture.  For  example, 
guy  wire  of  the  bridge  (Figure  9),  hair  of  the  monkey’s  face  (Figure  10),  etc.,  have  sharp 
edges  as  in  the  original.  This  result  shows  the  important  ability  of  the  image  model-based 
approach:  it  can  preserve  the  edges  and  comers  even  with  superior  performance  of  noise 
removal. 

The  lower  left  comer  of  each  picture  of  Figures  9-10  is  the  image  restored  by  median 
filter  with  a  5x5  window.  Note  that  the  5x5  window  gives  lowest  mean  square  error  as  well 
as  the  3x3  window  in  the  experiment  of  the  former  section.  Most  of  the  impulse  noise  are 
removed  in  this  picture,  but  it  is  much  more  blurred  than  the  result  of  robust  model-based 
method.  This  blurring  effect  can  be  more  easily  observed  in  the  images  with  many  edges 
and  comers  than  in  the  images  with  large  areas  with  constant  grey  levels.  Guy  wire  and 
details  of  the  bridge  frame  (Figure  9)  and  hairs  and  eyes  of  monkey’s  face  (Figure  10)  are 
blurred  and  cannot  be  observed  in  these  median  filtered  images.  The  regions  with  small 
intensity  variations  are  replaced  by  constant  grey  level  and  the  transitions  between  different 
regions  are  rather  abrupt.  This  effect  is  typical  in  the  median  filter,  and  it  is  because  the 
median  filter  fails  in  smoothing  images.  These  effects  can  be  observed  in  the  tower  region  of 
the  bridge  (Figure  9). 

The  lower  right  comer  of  each  picture  of  Figures  9-10  is  the  image  restored  by  a- 
trimmed  mean  filter  with  a  5x5  window  and  a=0.15.  Note  that  the  choice  of  a=0.15  is  con¬ 
sidered  a  good  choice  in  previous  studies  (Rey,  1983;  Bickel,  1977).  Even  though  the  a- 
trimmed  mean  filter  has  lower  mean  square  error  than  the  median  filter,  the  image  restored 
by  the  a- trimmed  mean  filter  is  more  blurred  than  the  median  filter.  Edges  and  comers  of 
the  image  convey  more  information  to  human  perception  and  because  of  this,  the  image 
restored  by  a-trimmed  mean  filter  is  worse  than  median  filter  in  the  visual  comparison  even 
though  it  has  smaller  mean  square  error.  For  example,  tower  and  guy  wire  in  the  bridge  (Fig¬ 
ure  9)  and  hairs  and  eyes  of  monkey’s  face  (Figure  10)  are  blurred.  It  is  also  not  successful 
in  removing  impulse  noise  and  has  considerable  residual  noise  caused  by  impulse  noise. 
These  residual  noise  can  be  observed  in  all  images  (Figures  9-10). 
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mean  square  error  <X10 


Figure  7.  Convergence  of  mean  square  errors,  (a)  The  outdoor  scene,  (b)  The  image  of  a 
Girl,  (c)  The  Bridge  scene. 


48 


/-V 

cn 

© 

ri 

X 

V 


k- 

o 


V. 


Oi 

a> 


k. 

m 

3 

c r 
w 

c 

<0 

0l 

6 


A 

<r> 

o 

X 


i. 

o 

k. 

i. 

0l 

a* 

i. 

<TJ 

3 

cr 

w 

c 

m 

a« 


uindou  size 


uindou  size 


(a)  ♦fixed  uindou  size  8 


(b)  ♦fixed  uindou  size  8 


A 

n 

o 

«-( 

x 

u 

o 

L. 

v. 

01 

0* 

w 

<0 

3 

cr 

U) 

c 

<T> 

0* 

€ 


(c) 


uindou  size 
♦fixed  uindou  size  8 


A 

n 

o 

x 

v 

i- 

O 

i- 

U 

0) 

Hi 

w 

<T> 

cr 

ut 

c 

<0 

Oj 

e 


(d) 


uindou  size 
♦fixed  uindou  size  8 


Figure  8.  Mean  square  error  comparisons  of  different  methods,  (a)  The  outdoor  scene, 
(b)  The  image  of  a  Girl,  (c)  The  Bridge  scene,  (d)  Purdue  campus  scene. 
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Figure  10.  Qualitative  comparison  for  Monkey  picture.  Most  of  details,  such  as  hair, 
eyes,  etc.,  are  clearly  shown  in  the  result  of  model-based  approach,  but  are 
not  clear  in  others,  (a)  Comtaminated  image,  (b)  Robust  model  approach,  (c) 
Median  filter,  (d)  a-trimmed  mean  filter. 


V.  COMPOSITE  EDGE  DETECTION. 


A.  Introduction 

Edge  detection  is  not  only  an  important  topic  in  image  processing  in  its  own  right,  but 
also  as  a  tool  for  the  important  problem  of  image  segmentation.  The  traditional  methods  of 
edge  detection  based  on  the  windows  of  Robert,  Prewitt  or  Sobel  (Rosenfeld  and  Kak,  1982) 
are  based  on  the  fact  that  there  is  a  sharp  change  in  the  intensity  on  either  side  of  an  edge 
pixel.  We  can  call  these  types  of  edges  as  step  edges.  Instead  of  using  the  step  function,  we 
can  employ  other  types  of  functions  like  the  roof  function  (Brady,  1982)  to  characterize  the 
local  intensity  behavior  near  the  edge.  In  recent  times  there  have  been  attempts  at  character¬ 
izing  and  detecting  edges  by  considering  the  intensity  density  over  a  broad  area  around  the 
edge  pixels.  Examples  of  these  methods  are  the  Laplacian  on  Gaussian  operator  (Marr  and 
Hildreth,  1980),  or  difference  of  Gaussians  (DOG)  (Wilson  and  Bergen,  1979),  the  facet 
model-based  methods  (Haralick,  1984),  and  the  causal  autoregressive  model-based  methods 
(Zhou  and  Chellappa,  1986). 

However,  there  is  another  mechanism  of  creation  of  an  edge  which  has  recently 
received  some  attention.  Consider  the  pixels  which  are  at  the  boundary  of  two  textures,  say 
cotton  canvas  and  raffia.  There  is  no  sharp  intensity  change  at  the  boundary,  yet  everyone 
will  perceive  the  existence  of  a  sharp  edge  at  the  boundary  of  the  two  textures.  We  can 
characterize  these  edge  pixels  as  texture  edges.  Recently  there  has  been  considerable  interest 
in  developing  methods  which  can  detect  all  the  texture  boundaries  in  a  scene  involving 
several  textures  (Kashyap  and  Eom,  1985a).  These  algorithms  effectively  locate  most  of  the 
boundaries  between  the  textures  which  are  perceived  by  a  human  observer.  Of  course,  any 
real  life  images  such  as  an  outdoor  scene  or  airport  scene  will  have  both  intensity  edges  and 
texture  edges. 

When  we  apply  the  methods  mentioned  earlier  for  detecting  edges  on  outdoor  scenes, 
the  final  result  is  not  satisfactory  for  several  reasons.  For  instance,  the  result  given  by  the 
Laplacian  on  Gaussian  approach  or  the  facet  model  approach  yields  a  lot  of  micro  edges 
corresponding  to  the  leaves  of  a  tree  or  the  inside  of  a  shrub  in  the  house  image.  These 
micro  edges  do  not  convey  much  information  and  only  add  to  the  confusion.  Even  the  edges 
due  to  runways  or  highways  are  often  smeared.  The  texture  boundaries  are  never  sharply 
delineated.  These  methods  cannot  distinguish  between  the  edges  within  a  texture  like  the 
wood  texture  and  the  boundary  between  the  two  textures,  say  wood  and  cork. 

The  texture  based  algorithms  also  have  their  limitations.  Since  the  size  of  the  windows 
or  masks  needed  to  detect  or  discriminate  between  textures  is  much  bigger  than  that  used  in 
the  other  methods,  sharp  edges  like  highways  or  runways  in  the  airport  are  missed  by  these 
images. 

The  purpose  of  this  section  is  to  develop  a  composite  edge  detection  approach  which 
can  detect  all  types  of  edges  including  intensity  edges  and  texture  edges.  We  employ  a  two 
stage  approach.  In  the  first  stage,  we  use  an  algorithm  which  determines  all  the  possible 
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pixels  in  an  image  which  are  potential  edges  (either  intensity  edge  or  texture  edge).  In  addi¬ 
tion,  the  algorithm  gives  the  direction  of  the  potential  edge.  In  the  second  stage,  we  submit 
each  candidate  edge  pixel  to  two  procedures,  one  of  which  is  designed  to  test  whether  the 
candidate  edge  pixel  is  a  texture  edge  or  not,  and  the  other  is  designed  to  test  whether  the 
candidate  edge  is  an  intensity  edge.  We  accept  only  those  edges  which  pass  at  least  one  of 
the  two  tests.  The  procedure  for  testing  for  the  texture  edge  is  a  likelihood  approach  based 
on  a  causal  autoregressive  model.  The  procedure  for  testing  for  a  step  edge  is  fairly  conven¬ 
tional. 

The  comprehensive  algorithm  (Eom,  Kashyap,  1987,  1989a,  1989b)  presented  here  was 
applied  to  several  images,  both  synthetic  as  well  as  real  life  images.  The  synthetic  images 
are  checkerboard  images  involving  two  different  textures  alternately.  Each  texture  has  its 
own  internal  structure.  The  other  two  images  are  the  outdoor  scene  and  the  airport  image. 
We  give  the  results  of  our  algorithm.  To  bring  out  the  highlights  of  our  approach,  we  also 
give  the  results  of  the  two  popular  edge  detection  approaches  in  recent  literature,  namely  the 
Laplacian  on  Gaussian  method  and  the  facet  model  approach,  for  all  four  images.  The 
overall  approach  is  given  in  Figure  1 1. 


Original 

Image 


Final 

Detected 

Edges 


Figure  11.  Block  diagram  of  the  composite  edge  detection  algorithm 


B.  Edge  Hypothesis  Generation  (Algorithm  1) 

As  indicated  in  Figure  11,  the  first  step  in  the  composite  edge  detection  algorithm  is 
identifying  all  pixels  which  are  potential  edge  pixels.  In  this  process,  all  potential  edge  pix¬ 
els  should  be  detected  whether  they  are  step  edges,  roof  edges,  or  texture  edges.  Intensity 
edges,  such  as  step  edges  or  roof  edges  have  abrupt  changes  of  intensity  at  the  edge  pixels 
and  these  can  be  detected  by  a  derivative  operator.  Intensity  transition  is  also  involved  at  the 
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texture  boundary  as  well  as  at  microedges  inside  of  each  texture  and  it  can  be  detected  by  a 
derivative  operator.  The  algorithm  used  here  is  based  on  directional  derivatives.  We  use 
3x3  masks  so  that  the  edge  pixels  deleted  here  are  relatively  sharp.  Large  mask  operators  are 
not  adequate  because  they  yield  potential  edge  pixels  which  are  situated  away  from  the 
actual  or  true  edge  pixels. 

Let  g(x,y)  be  the  image  intensity  at  position  (x,y).  The  first  order  directional  derivative 
is  given  by  the  following  equation. 


dg  _  dg 
da  dx 


dg  . 

cosa+  -^-sina 
dy 


(40) 


where  4^-  and  4^  are  partial  derivatives  of  g  in  x  and  y  directions  and  can  be  obtained  by 
dx  dy 

convolving  with  the  following  differencing  operators  Dx  and  Dy. 


°,=  3 


■1  0  1 
-1  0  1 
■1  0  1 


-1  -1  -1 
0  0  0 

1  1  1 


(41) 


i.e.. 


dg 


P-(iJ)=  X  g(i+k,j+l)Dx(k,l) 

dx  U=-l 


(42.  a) 


|^(i,j)=  X  g(i+k,j+l)Dy(k,l) 

°y  k,i=— i 


(42.b) 


The  angle  of  gradient  direction  is 

^  dg/dx 

Likewise,  the  second  order  directional  derivative  is  given  by  the  following. 

d2g  d2g  2  „  d2g  .  d2g  .  2 

— f-  =  — f-cos2a  +  2-  $  coscxsina  +  — |-sin  a, 
da2  dx2  oxdy  dy2 


(43) 


(44) 


d2  S  d2  2  d2  g 

where  second  order  partial  derivatives  — f-,  ,  .  and  — are  obtained  by  convolving  g 

dx2  dxdy  dy2 

with  the  following  second  order  differencing  operators  Dxx,  Dxy  and  Dyy. 
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‘l  -2  1 
1  -2  1 
1  -2  1 


1  1  1 
-2  -2  -2 
1  1  1 


1  0  -1 
0  0  0 
-1  0  1 


i.e.. 


—  i  g(i+k,j+l)D„(k,l) 
dx  ic,i=-i 


£  g(i+k,j+l)Dxy(k,l) 
dxdy  k,£-i 


E  g(i+k, j+l)Dyy  (k,  1) 

oy  k.1^1 


(45) 


(46.  a) 


(46.b) 


(46.c) 


An  edge  hypothesis  is  made  at  the  pixel  whose  first  directional  derivative  has  a  magni¬ 
tude  larger  than  a  threshold  t!  and  the  corresponding  second  order  directional  derivative  is 
negative,  i.e., 


I  |  ^  , 
3a1  *'• 


and 


d2g 

da2 


<0 


(47) 


Note  that  the  Prewitt  operator  is  a  special  case  of  this  directional  derivatives  method  and  the 
Prewitt  operator  does  not  involve  second  order  directional  derivatives. 

The  angle  of  the  first  derivative  is  given  by  a  =  tan-1  ( and  it  can  be  any  value 

dg/dx 

between  0  to  360  degrees.  The  angle  of  edge  direction  is  quantized  into  4  directions  as 
defined  in  Table  TV  so  that  a  horizontal  or  vertical  directional  strips  can  be  applied.  Around 
each  potential  edge  pixel,  a  mx2n  strip  (5x16  is  used  in  this  experiment)  is  constructed  so 
that  the  strip  is  perpendicular  to  the  approximated  edge  direction  (Figure  12).  For  each 
potential  edge  pixel  at  the  center  of  the  strip,  the  following  null  hypothesis  Hq  is  assigned. 

Ho  =  An  edge  exists  in  the  given  direction 


The  above  hypothesis  is  tested  by  applying  decision  rules  to  the  image  strip.  The  details  of 
the  tests  are  given  later. 
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Table  IV. 


Quantization  rule  for  estimated  edge  direction 


gradient  angle(degrees) 

approximated  direction 

type  of  strip 

315-45 

0 

horizontal 

45  - 135 

2 

vertical 

135  -  225 

4 

horizontal 

225  -  315 

6 

vertical 

m 


potential  edge  pixel 


n 

□ 

— 

p 

m — 

n2 

estimated 
edge  direction 


estimated 
gradient  direction 


Figure  12.  m  x  2n  strip  of  image  with  potential  edge  pixel  at  center 

C.  Confirming  the  Presence  of  Edges  (Algorithm  2) 

The  potential  edge  pixels  selected  by  the  edge  hypothesis  generation  process  given  in 
Section  V.B  are  not  the  final  edge  pixels.  Each  potential  edge  can  be  either  an  intensity 
edge,  a  texture  edge,  or  a  spurious  edge  (micro-edge)  caused  by  intensity  changes  inside  of  a 
texture.  We  want  to  detect  only  valid  edges  such  as  intensity  edges  or  texture  edges,  but 
microedges  (spurious  edges)  need  be  deleted  from  the  potential  edge  map.  We  need  to 
confirm  valid  edges  at  each  potential  edge  pixel.  This  confirmation  process  involves  two  dif¬ 
ferent  types  of  confirmation  processes.  Intensity  edges  and  texture  edges  have  different  gen¬ 
eration  mechanisms,  and  these  need  to  be  confirmed  by  separate  decision  processes. 

Therefore,  two  different  types  of  decision  rules  are  needed  to  detect  both  texture  edges 
and  intensity  edges.  The  first  decision  rule  tests  the  existence  of  a  texture  edge  at  the  given 
position  and  edge  direction,  and  it  is  based  on  the  likelihood  ratio  test  with  statistical  texture 
modelling  method.  The  second  decision  rule  tests  the  existence  of  an  intensity  edge,  and  it  is 
based  on  the  differencing  operator  with  weighted  differencing.  The  pixels  which  fail  in  both 
of  these  tests  will  be  deleted  from  the  final  edge  m?p. 
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1.  Confirming  a  Texture  Edge 


A  texture  edge  in  an  image  can  be  modelled  as  a  boundary  between  two  different  tex¬ 
ture  regions.  This  is  analogous  to  the  intensity  edge  which  is  modelled  by  a  boundary 
between  two  different  grey  levels.  Detection  of  a  texture  edge  is  much  more  difficult  than 
detection  of  intensity  edges,  because  each  texture  region  contains  many  microedges.  The 
texture  edges  cannot  be  detected  by  the  strength  of  gradient  or  Laplacian  operators,  and  we 
need  a  method  to  characterize  textures  before  detecting  texture  edges.  Textures  can  be 
characterized  by  a  small  number  of  parameters  after  fitting  the  image  by  an  image  model 
such  as  causal  autoregressive  model. 

Consider  a  horizontal  strip  of  an  image  intensity  array  which  is  sufficiendy  small,  so 
that  the  strip  can  have  at  most  two  different  textures.  If  it  has  two  textures,  the  boundary 
between  textures  can  be  assumed  as  vertical.  In  this  strip,  a  texture  edge  is  defined  as  the 
boundary  between  two  different  textures. 

Consider  the  strip  around  the  candidate  pixel  defined  earlier.  Let  the  null  and  alterna¬ 
tive  hypothesis  be 

Hq  =  texture  edge  exists  at  the  given  pixel  and  direction 
Hj  =  no  texture  edge  exists  at  the  given  pixel  and  direction. 

Under  the  hypothesis  Ho,  texture  in  the  left  of  the  potential  edge  (this  region  will  be 
called  £20  and  texture  in  the  right  of  the  potential  edge  (this  region  will  be  called  £22)  are 
different  from  each  other.  These  two  different  textures  are  modeled  by  causal  autoregressive 
models.  The  models  in  the  regions  £2i  and  £22  are  defined  below. 

g(i,j)  =  9lz(i,j)  +  ^JpioKui),  if  0<i<m,0<j<n  (region  £20  (48) 


g(i,j)  =  0jz(i,j)  +  0<i<m,n<j<2n  (region  £22)  (49) 

where  {co(i.j)}  is  a  standard  2D  white  noise  sequence,  0]  and  02  are  parameter  vectors  for 
the  regions  £2t  and  £22,  respectively,  and  z(i,j)  is  a  4- vector. 


z(i,j)  = 


g(i.j-l) 

g(i-l.j) 

g(i-lj-l) 

1 


The  parameters  of  the  autoregressive  model  in  the  2  regions  £2i  and  £22  will  be  different 
under  the  null  hypothesis  Ho- 

On  the  other  hand,  under  the  hypothesis  H1?  the  strip  has  only  one  type  of  texture, 
which  is  also  assumed  to  follow  a  causal  autoregressive  model. 
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(50) 


g(i,j)  =  80 z(ij)  +  VpoWj).  if  0<i<m,0sj:£2n  (region  QjU^2) 
where  0o  is  the  parameter  vector  and  z(i,j)  is  previously  defined. 

The  decision  rule  based  on  the  likelihood  ratio  test  has  the  following  form: 

[accept  Ho  if  logp(g | Ho)-logp(g |Hj)  >  K 
[reject  Ho  if  logp(g|H0)-logp(g|H1)  <  K 

where  K  is  a  constant.  The  likelihood  functions  logp(g|Ho)  and  logp(g|Hi)  for  autoregres¬ 
sive  model  are  given  in  (Kashyap,  Eom,  1988;  Eom,  Kashyap,  1989b).  The  proof  can  be 
found  in  the  reference  (Kashyap,  1982). 

The  texture  edge  detection  by  applying  the  decision  rule  (*)  on  the  pixels  with  edge 
hypothesis  has  several  advantages  over  the  texture  boundary  detection  algorithms  given  in 
(Kashyap,  1985a).  First,  the  texture  edge  direction  is  estimated  in  new  method,  and  this 
gives  more  accuracy  in  detecting  edges  than  applying  both  horizontal  and  vertical  strips. 
Second,  the  new  method  tests  only  the  existence  of  a  texture  edge,  and  it  provides  much  fas¬ 
ter  processing. 


2.  Confirming  an  Intensity  Edge 

This  decision  rule  tests  the  existence  of  an  intensity  edge  at  the  pixel  having  edge 
hypothesis.  The  intensity  edge  is  modeled  by  a  step  edge  and  the  decision  is  made  on  the 
output  of  the  differencing  operator  with  weighted  averaging.  Briefly  speaking,  with  the  strip 
applied  at  the  given  pixel,  the  difference  of  the  weighted  average  of  grey  levels  in  both  sides 
from  the  potential  edge  pixel  is  computed.  If  this  difference  exceeds  a  threshold,  the  pixel  is 
accepted.  This  decision  rule  also  can  be  extended  to  detect  the  local  maximum  instead  of 
detecting  the  strength  of  the  weighted  differencing  operator  output.  Let  W(i,j)  be  a  weight 
function.  This  weight  function  should  be  asymmetric  with  respect  to  the  hypothetical  edge 
pixel  and  direction.  Then  the  output  of  the  weighted  differencing  operator  is  given  by  the 
following  equation. 

g  =ISW(i,j)g(i,j)  (51) 

i=0j=0 


All  edge  detection  window  operators  can  be  considered  as  a  member  of  this  weighted  dif¬ 
ferencing  operators.  For  example,  Prewitt,  Robert  and  Sobel  operators  (Rosenfeld  and  Kak, 
1982)  are  weighted  differencing  operators  with  appropriate  weight  functions  detecting  large 
output  as  edges,  Laplacian  on  Gaussian  (Marr  and  Hildreth,  1980)  operator  is  also  a 
weighted  differencing  operator  with  a  derivative  of  Gaussian  weight  function  detecting  local 
maximum  of  output  Many  variations  of  weight  functions  are  possible,  but  we  will  restrict 
our  attention  to  the  simple  operator  which  can  detect  the  step  edges.  Probably  the  simplest 
weighted  differencing  operator  is  the  one  with  uniform  weight  function.  This  operator  is 
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defined  for  the  given  strip  as  follows. 


W(i,j)  = 


-1 


if  0  £  j  £  n 
if  n  <j  £  2n 


(52) 


The  above  operator  is  used  to  decide  the  existence  of  an  intensity  edge  at  the  potential  edge 
pixel  in  our  experiment.  The  decision  is  based  on  the  strength  of  the  operator  output,  i.e.. 


accept  edge 
reject  edge 


if  g  >  t 
otherwise 


(53) 


where  t  is  a  constant. 

Experimental  results  (Figures  13-15)  show  good  performance  with  this  simple  decision 

rule. 

D.  Experimental  Results 

The  composite  edge  detection  algorithm  is  tested  with  the  following  four  different 
images  (Figure  13).  Figure  I3.a  is  a  128x128  image  generated  from  two  textures  chosen 
from  Brodatz’s  photo  album  (Brodatz,  1966),  grass  and  wood  grain  textures.  This  image  has 
only  major  edges  at  the  boundary  of  two  textures  but  each  square  has  many  weak  edges 
caused  by  textures. 

Figure  13.b  is  a  128x128  original  test  image  generated  by  rotating  a  checker  board 
image  generated  similarly  as  Figure  13.a.  Textures  in  this  image  are  the  same  as  in  Figure 
13.a.  The  major  edges  of  this  image  are  sloped  in  a  45-degree  direction  and  each  diamond 
pattern  has  many  weak  edges  caused  by  intensity  changes  within  a  texture.  This  image  is 
given  to  demonstrate  that  our  method  can  detect  edges  which  are  neither  horizontal  nor  verti¬ 
cal.  Figure  13.c  is  a  256x256  monkey  image. 

Experiment  1:  Checker  Board  Image 

Figure  14.a  is  the  final  result,  of  composite  edge  detection  algorithm  with  a  low  thres¬ 
hold  in  the  decision  rule  for  the  intensity  edge.  It  shows  the  detected  major  edges  at  the 
boundary  of  two  different  textures  as  well  as  weak  edges  inside  of  each  texture.  The  edges 
detected  inside  of  textures  are  close  to  the  actual  edge  locations.  Figure  14.b  is  the  result  of 
composite  edge  detection  algorithm  with  a  high  threshold  in  the  decision  rule  for  the  inten¬ 
sity  edge.  It  shows  only  major  edges  between  two  different  textures  and  most  of  the  weak 
edges  inside  the  textures  are  eliminated.  Thus  an  investigator  can  get  an  idea  of  the  texture 
edges  (corresponding  to  the  boundaries  between  textures)  and  the  intensity  edges  separately. 

Figure  14.c  is  the  result  of  Laplacian  on  Gaussian  approach  with  a  =  0.5.  Even  if  we 
alter  the  parameters,  the  final  edge  map  is  still  similar  to  the  one  before.  Thus,  if  we  use  this 
approach,  we  cannot  distinguish  the  edges  which  are  caused  by  the  boundaries  of  textures 
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and  the  microedges  within  each  texture. 

Figure  14.d  is  a  result  of  facet  model  approach.  It  shows  detected  major  edges  and  weak 
edges.  Even  if  the  parameters  are  changed,  the  final  edge  map  is  similar  to  the  Figure  14.d. 
Thus  if  we  use  this  approach,  the  texture  edges  and  intensity  edges  are  not  distinguished. 
Another  noticeable  distortion  is  at  the  comer  of  the  square.  The  detected  edges  around  the 
comer  are  distorted. 

Experiment  2:  Rotated  Checker  Board  Image 

Figure  15.a  is  the  final  result  of  composite  edge  detection  algorithm  with  a  low  thres¬ 
hold  in  the  decision  rule  for  the  intensity  edge.  It  shows  all  major  edges  between  texture 
regions  and  weak  edges  inside  of  each  texture  region.  The  location  of  detected  edges 
correspond  to  actual  edge  locations  of  the  original  image.  Figure  15.b  is  the  final  result  of 
composite  edge  detection  algorithm  with  a  high  threshold  in  the  decision  rule  for  the  inten¬ 
sity  edge.  It  shows  all  major  edges  between  different  texture  regions,  but  most  of  the  weak 
edges  inside  of  a  texture  region  are  removed  without  weakening  major  edges. 

Figure  15.c  is  the  result  of  Laplacian  on  Gaussian  operator.  It  shows  both  major  edges 
and  weak  edges.  The  final  edge  map  does  not  change  even  if  the  parameters  are  changed. 
Therefore  if  we  use  this  approach,  texture  edges  and  intensity  edges  are  not  distinguished. 
Figure  15.d  is  the  result  of  facet  model  approach.  It  shows  severely  distorted  detected  edges. 
It  contains  major  edges  between  texture  regions  and  weak  edges  inside  of  each  texture 
region,  but  weak  edges  cannot  be  separated  from  the  major  edges  by  changing  parameters. 

Experiment  3:  Monkey  Image 

Figure  16.a  is  the  image  of  pixels  having  edge  hypothesis  which  is  obtained  by  the  edge 
hypothesis  generation  process  which  is  described  in  Section  V.B.  It  shows  sharp  edges,  and 
the  location  of  these  potential  edge  pixels  are  very  close  to  actual  edge  location.  For  exam¬ 
ple,  eyes  of  the  monkey,  lines  in  the  center  of  the  image,  etc.,  are  well  detected  and  show 
good  performance  of  this  algorithm  as  an  edge  detection  method.  The  performance  as  an 
edge  detection  method  is  superior  than  other  edge  detection  methods. 

Figure  16.b  is  the  final  result  of  the  composite  edge  detection  algorithm.  Notice  that 
most  of  microedges  in  the  texture  region  in  the  cheeks  of  the  monkey’s  face  are  removed,  but 
most  of  important  edges,  such  as  eyes  and  nose  of  the  monkey,  lines  in  the  center  of  the  pic¬ 
ture,  are  well  preserved. 

Figure  16.c  is  the  result  of  Laplacian  on  Gaussian  operator.  It  shows  distorted  major 
edges,  and  many  unwanted  edges  caused  by  textures  in  the  cheeks  of  the  monkey’s  face. 
This  picture  not  only  includes  many  unwanted  microedges  but  also  shows  distorted  major 
edges.  The  edges  in  the  eyes  and  nose  region  are  distorted  and  barely  distinguishable. 

Figure  16.d  is  the  result  of  facet  model  approach.  Detected  edges  are  distorted  and  con¬ 
tain  many  false  (spurious)  edges.  The  location  of  detected  edges  are  relatively  far  from  the 
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actual  edge  location. 

E.  Discussions  and  Conclusions 

Edges  are  generated  in  at  least  two  different  ways,  namely  by  the  difference  in  intensity 
(intensity  edge)  and  by  the  difference  in  textures  (texture  edge).  The  importance  of  the  tex¬ 
ture  edge  is  demonstrated  by  the  examples.  Conventional  edge  detection  algorithms  cannot 
distinguish  between  texture  edges  and  intensity  edges.  A  new  edge  detection  algorithm 
which  can  detect  both  intensity  and  texture  edges  is  developed. 

The  performance  of  the  composite  edge  detection  algorithm  shown  in  this  experiment 
can  be  summarized  into  the  following  two  points. 

1.  Edge  hypothesis  generation  procedure  developed  in  this  research  can  be  used  as  an  edge 
detection  method,  and  the  performance  as  an  edge  detection  algorithm  is  better  than 
other  edge  detection  methods. 

2.  Our  composite  edge  detection  algorithm  is  flexible  enough  to  detect  both  major  and 
weak  edges  by  changing  threshold.  In  other  words,  it  can  detect  only  major  edges 
without  detecting  microedges  which  are  caused  by  texture  for  high  threshold  and  can 
detect  both  major  edges  and  microedges  for  lower  threshold. 


61 


•'**v 


Figure  14.  Comparison  with  checker  board  image,  (a)  edges  detected  by  composite  edge 
detection  algorithm  with  low  threshold,  (b)  edges  detected  by  composite  edge 
detection  algorithm  with  high  threshold,  (c)  result  of  Laplacian  on  Gaussian 
method,  (d)  result  of  facet  model  method 
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Figure  15.  Comparison  with  rotated  checker  board  image,  (a)  edges  detected  by  compo¬ 
site  edge  detection  algorithm  with  low  threshold,  (b)  edges  detecied  by  com¬ 
posite  edge  detection  algorithm  with  high  threshold,  (c)  result  of  Laplacian  on 
Gaussian  method,  (d)  result  of  facet  model  method 
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Figure  16.  Comparison  with  monkey  image,  (a)  potential  edges  (pixels  having  edge 
hypothesis)  detected  by  algorithm  1,  (b)  composite  edges  detected  by  algo¬ 
rithm  2  (composite  edge  detection  algorithm),  (c)  result  of  Laplacian  on  Gaus¬ 
sian  method,  (d)  result  of  facet  model  method 


VI.  SUMMARY  AND  CONCLUSIONS.  Robust  image  models  are  investigated,  and 
applied  to  several  important  image  processing  problems  in  this  study.  Robust  image  models 
have  potential  applications  in  many  problems  arising  in  image  processing  and  computer 
vision.  Image  models  are  already  used  in  image  synthesis,  texture  analysis,  image  coding, 
and  image  segmentation,  but  they  are  generally  nonrobust  to  outliers.  We  applied  the  robust 
image  models  to  two  important  problems  in  image  processing,  namely  image  restoration  and 
edge  detection.  The  robust  model-based  methods  are  compared  experimentally  with  conven¬ 
tional  methods.  The  advantage  of  robust  model-based  methods  over  conventional  methods 
in  some  of  image  processing  problems  has  been  shown  in  Sections  IV  and  V. 
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ABSTRACT.  The  Stroh  formalism  for  anisotropic  elastic  materials  has  contributed 
much  to  the  determination  of  solutions  of  anisotropic  elasticity  problems.  In  most  cases, 
however,  the  solutions  are  in  a  complex  form  and  it  is  desirable  to  have  the  solutions  in  a 
real  form  for  practical  applications.  This  requires  new  identities  or  sum  rules  which  relate 
eigenvalues  and  eigenvectors  of  anisotropic  elastic  constants  to  real  quantities.  The  identi¬ 
ties  serve  two  important  purposes.  Firstly,  with  the  identities  the  problem  of  repeated 
eigenvalues  disappears.  Secondly,  the  identities  enable  us  to  express  the  final  solutions  to 
anisotropic  elasticity  problems  in  a  real  form.  The  identities  and  the  structural  property  of 
certain  real  matrices  in  the  solution  are  the  keys  in  solving  heretofore  unsolved  problems 
and  in  simplifying  existing  complex  solutions  to  a  real  form  solution.  As  a  result,  some 
interesting  phenomena  unnoticed  before  have  been  revealed.  For  instance,  it  was  discov¬ 
ered  only  recently  that  the  surface  traction  on  any  radial  plane  in  the  anisotropic  elastic 
material  due  to  a  concentrated  force  and  a  line  dislocation  applied  at  the  origin  is  indepen¬ 
dent  of  the  choice  of  the  radial  plane. 

INTRODUCTION.  Following  the  work  of  Eshelby,  et  al.  [l],  Stroh  in  1958  [2]  and 
1962  [3]  developed  a  powerful  and  elegant  formalism  for  treating  a  certain  class  of  two- 
dimensional  problems  involving  dislocations,  line  forces  and  steady  state  waves  in  aniso¬ 
tropic  elastic  solids.  The  formalism  is  well-known  in  the  physics  and  materials  science 
community  (see  (4— 10],  for  example).  Unlike  the  two-dimensional  anisotropic  solutions 
developed  by  Green  and  Zerna  [11]  which  are  restricted  to  plane  strain  deformations  and 
hence  to  monoclinic  materials,  the  Stroh  formalism  applies  to  general  anisotropic  elastic 
materials  for  which  all  three  displacement  components  are  necessarily  coupled.  Also, 
unlike  the  Lekhnitskii’s  formalism  [12]  which  breaks  down  for  orthotropic  materials  [13] 
and  requires  a  special  treatment  [14],  the  Stroh  formalism  has  no  restrictions.  An  excellent 
review  on  the  Stroh  formalism  can  be  found  in  [8]. 

The  basic  elements  of  Stroh  formalism  are  the  eigenvalue  p  and  the  eigenvectors  ( 

of  anisotropic  elastic  constants.  The  solution  to  an  anisotropic  elasticity  problem  is,  in 
general,  expressed  in  terms  of  p’s  and  £’s  which  are  complex.  There  are  identities  or 

sum  rules  which  express  certain  combinations  of  p’s  and  (’s  in  terms  of  real  matrices 
N-,  i = 1,2,3,  S,  H  and  L  to  be  defined  later.  The  identities  enable  us  to  rewrite  the  com¬ 
plex  solutions  into  a  real  form.  The  structures  of  N  S,  H,  L  tell  us  in  depth  information 
on  the  physical  property  o^  the  final  solution. 

We  outline  in  Section  2  Stroh  formalism.  The  eigenvalues  p  and  eigenvectors  ( 

are  defined.  Problems  arise  when  p  has  a  repeated  root  and  the  modifications  required 
aie  given  in  Section  3.  The  orthogonality  relations  between  the  eigenvectors  are  presented 
in  Section  4.  Basic  identities  between  p,  (  and  the  real  matrices  S,  H,  L  are  derived.  In 


Section  5,  an  alternative  expression  for  S,  H,  L  due  to  Barnett  and  Lothe  is  presented. 

Also  presented  are  the  structure  of  S,  H,  L  and  N^,  N^.  In  the  last  section,  we  show  new 

identities  which  are  useful  in  solving  certain  problems  in  anisotropic  elastic  materials  and 
composites. 


STROH  FORMALISM.  In  a  fixed  rectangular  coordinate  system  x-,  i=l,2,3,  let 
u-  and  i r-j  be  the  displacement  of  a  particle  and  the  stress,  respectively.  The  equations  of 
equilibrium  and  stress-strain  laws  can  be  written  as 


ir. .  •  =  0  , 

1J.J 

cij  ~  ^ijksUk,s  1 


in  which  repeated  indices  imply  summation,  a  comma  stands  for  partial  differentiation  and 
are  the  elastic  constants  which  possess  the  normal  symmetry  property 


^ijks  ^jiks  ^ksij  ' 

Consider  a  two-dimensional  deformation  in  which  u^,  k=l,2,3,  depend  on  x^  and  X2 
only.  The  general  solution  has  the  form 

(3)  uk  =  akf(z)  , 

(4)  z  =  xx  +  px2  , 

where  f  is  an  arbitrary  function  of  z  and  p,  ak  are  constants.  In  matrix  notation,  they 
are  determined  by 

(5)  {Q  +  p(R  +  RT)  +  p2T}  a  =  0  , 
in  which  the  superscnpt  T  stands  for  the  transpose  and 


(6) 


Qik  =  Cilkl  , 
Rik  =  Cilk2  ’ 
Tik  =  Ci2k2  ’ 


Equation  (5)  is  obtained  by  substituting  (3)  into  (2)  and  (1).  We  note  that  Q  and  T  are 
symmetric  and,  subject  to  positiveness  of  strain  energy,  positive  definite. 
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Introducing  the  new  vector 


17)  b  =  (RT  +  pT)a  = -^  (Q  +  pR)a  , 

in  which  the  second  equality  comes  from  (5),  the  stress  obtained  by  substituting  (3)  into 
(2)  can  be  written  as 


'il  = 


*i2  = 


l  $  =  bf(z) 


Thus  ^  is  the  stress  function. 


The  two  equations  in  (7)  can  be  rewritten  in  the  following  standard  eigenrelation 

(9)  N(  =  p(  , 

(■  N  N, 

N  =  ~  1  ~  * 

”  [  n3  nx 

N1  =  -T_1RT  ,  N2  =  T'1  =N2  • 

N3  =  RT_1RT  -  Q  =  N  3  . 

We  see  that  N3  and  N3  are  symmetric  and  is  positive  definite.  It  is  shown  in  [15] 
that  -N„  is  positive  semi -definite.  Equation  (9)  provides  six  eigenvalues  p  ,  a=l,2,...6, 
and  six  associated  eigenvectors  ( .  Since  pfl  cannot  be  real  if  the  strain  energy  is  positive 
[1],  we  let 

pa+3  =  pa  ’  Im  pa  >  0  • 


ia+ 3  ~  la  '  a  ~  1,2,3  1 

where  an  overbar  denotes  the  complex  conjugate  and  Im  stands  for  the  imaginary  part. 
The  general  solution  for  the  displacement  and  stress  function  given  by  (3)  and  (8)  can  be 
written  as 

3 


u=>  a  f  (z  )  +  a  f  .,(z  ), 
~a  <r  a'  -a  a' 


^)=V  bf(z)  +  bf  ,(z  ) 
-  £4  -a  qk  a1  -a  a+S'1  aJ 
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Z  =  X,  +  p  x0 
a  1  ra  2 


where  fpf0,...,fg  are  arbitrary  complex  functions  of  their  arguments. 

For  u  and  4  to  be  real,  we  let 

^a+3  =  ‘a  ’  a  =  1,2,3  1 
and  write  the  general  solution  as 


(11) 


'  3 

u  =  2  Re  ■ 

a=l 

'  3 

4  =  2  Re  ■ 

y  b  f  (z  ) 

_a  av  a' 

a=l 

V.  J 

or 


(12) 


w  =  2  Re 


r  3 

y  (  f  (z  ) 

^  ia  av  aJ 


a=l 


w  = 


u 

a 

U 

il 

-a 

{ 

b 

-a 

where  Re  stands  for  the  real  part.  We  observe  that  w  satisfies  the  differential  equation 

[a| 

”2=t?*l 

DEGENERATE  MATERIALS.  Equations  (11)  or  (12)  are  complete  when  the  6x6 
matrix  N  in  (9)  is  simple,  i.e.,  when  the  eigenvalues  pfl  of  N  are  distinct.  It  remains 

complete  when  N  is  semisimple,  i.e.,  when  pfl  have  a  repeated  eigenvalue  and  the  asso¬ 
ciated  eigenvectors  are  independent.  If  N  is  non-semisimple,  i.e.,  if  pfl  have  a  repeated 
eigenvalue  but  the  eigenvectors  (  are  not  all  independent,  the  solution  given  by  (12)  is 
not  complete.  Anisotropic  elastic  materials  for  which  N  is  non— semisimple  are  called 

degenerate  materials.  Isotropic  materials  are  a  special  class  of  degenerate  materials.  For 
isotropic  materials,  we  have  p^  =  Pj  =  i  and  In  fact,  p^  =  i  also  but  t  (y 

For  degenerate  materials  for  which  =  (j.  (12)  is  replaced  by  [16,17] 


w  —  2  Re  R"  (i*i^z2^  ~3^z3^}  1 


n 


in  which  satisfies  the  following  equation  which  is  obtained  by  differentiating  (9)  with 
respect  to  p  and  setting  p  =  p^: 

We  see  that  the  solution  for  degenerate  materials  destroys  the  regular  expression  of  the 
solution  given  by  (12)  for  general  anisotropic  materials.  This  happens  not  only  for  the 
general  solution,  it  also  occurs  in  applications  in  which  the  final  solution  has  a  nice  simple 
form  for  general  anisotropic  materials  but  has  a  complicated  expression  for  degenerate 
materials. 

It  is  desirable,  therefore,  to  have  an  expression  which  holds  regardless  of  whether 
the  material  is  degenerate  or  not.  This  means  that  we  need  the  solution  in  a  form  which 
does  not  contain  the  eigenvalues  p  and  the  eigenvectors  (  of  the  6x6  matnx  N.  We 

could  achieve  this  if  we  have  identities  which  relate  p  and  (  to  real  quantities  repre¬ 
sented  by  N,  or  by  quantities  derivable  from  N.  This  is  the  main  subject  in  the  follow¬ 
ing  sections. 

THE  ORTHOGONALITY  RELATIONS.  The  left  eigenvectors  t,  of  N  is 

Tv  T 

V  N  =  pv  , 

or 

(13)  NT77  =  p»7  . 

The  left  eigenvectors  and  the  right  eigenvectors  (  associated  with  different  eigenvalues 
p  are  biorthogonal  to  each  other  [18|.  If  N  is  simple,  we  can  normalize  the  eigenvectors 
such  that 

(H)  ?1</3=V 

where  S ^  is  the  Kronecker  delta.  If  N  is  semisimple,  (14)  remains  valid  because  it  is 

possible  to  choose  the  eigenvectors  associated  with  the  repeated  eigenvalue  in  such  a  way 
that  (14)  holds.  If  N  is  non-semisimple,  (14)  is  not  valid  for  the  repeated  eigenvalue.  A 

modified  relation  can  be  found  in  [8] .  If  we  introduce  the  6x6  matrices  U  and  V  by 


?  =  lyy-.y . 

Y  =  ^l’?2’ '"’-6^  ’ 

(14)  can  be  written  as 

VTU  =  I  , 
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where  I  is  the  6x6  identity  matrix.  This  implies  V1  and  U  are  inverse  of  each  other 
and  hence  the  product  commutes,  i.e., 


(15) 


UVT  =  I  . 


Denoting  the  6x6  matrix  J  by 


J  = 


0  I 
I  0 


I  in  this  context  being  the  3x3  identity  matrix,  it  can  be  shown  that 

JN  =  (JN)T  =  NTJ  . 

It  follows  from  (9)  and  (13)  that  we  may  set  without  loss  of  generality, 

(i6)  5  =  J{. 

If  we  define  the  3x3  matrices.  A  and  B  by 


^~l’-2’-3  ’ 


we  have 


?  ”  [-l'-2’-3 


U  = 


A  A 
B  B 


V  =  JU 


Equation  (15)  leads  to,  after  carrying  out  the  matrix  multiplications, 

;T  _  n _ ddT  .  fs-nT 


(17) 

AAT  +  AA 

(18) 

abt  +  Ab' 

,  T  ,  nlT 


T  T 

Equations  (17)  imply  that  AA  and  BB  are  purely  imaginary  while  (18)  tells  us  that 


the  real  part  of  AB1  is  1/2.  Hence  we  let 
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(19) 


S  =  i(  2ABT  —  I  )  , 
H  =  2iAAT  =  HT  , 
L  =  -2  i  BBT  =  LT  , 


where  S,  H,  L  are  3x3  real  matrices  first  introduced  by  Barnett  and  Lothe  [6].  H  and  L 

are  symmetric  and  can  be  shown  to  be  positive  definite  [8,15,19].  Noting  that  (19)  can  be 
written  as 


(20) 


'  s 

H  ‘ 

T 

=  2i 

‘  A  ‘ 

[btat1 

-L 

ST 

~  J 

B 

L  -  - 

and  using  the  following  relation  which  is  deduced  from  (14)  and  (16), 


we  have  [8] 


This  leads  to 


(21) 


f  btat  1 

■  A  ■ 

L  -  -  J 

B 

'  S  H  ' 

'  S  H  ‘ 

-L  ST 

-L  ST 

L  -  -  . 

'HL  -  SS  =  I  , 
SH  +  HST  =  0  , 
LS  +  STL  =  0  . 


We  see  that  S,  H,  L  are  not  independent  of  each  other.  We  also  see  from  (21)2  3  and 
their  counterparts 

H-1S  +  STH_1  =  0  , 

SL-1  +  L_1ST  =  0  , 

that  SH,  LS,  H  ^S  and  SL  *  are  antisymmetric. 


THE  STRUCTURE  OF  S,  H,  L  AND  Nr  N3-  The  three  real  matrices  S,  H, 

L,  which  are  the  Barnett-Lothe  tensors,  appear  very  often  in  the  solutions  to  anisotropic 

elasticity  problems  (see  [7,8,20,21],  for  example).  The  expressions  given  by  (19)  are  based 
on  the  assumption  that  the  eigenvectors  (  span  a  six-dimensional  space.  When  N  is 

non-semisimple,  we  do  not  have  six  independent  eigenvectors  and  (19)  are  not  valid.  In 
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fact,  one  encounters  problems  also  when  N  is  almost  non— semisimple.  A  modified  expres¬ 
sion  in  place  of  (19)  when  N  is  non-semisimple  or  almost  non-semisimple  was  presented 
in  [22].  The  modified  expression  applies  to  simple  N  as  well. 


An  alternate  approach  which  avoids  the  determination  of  eigenvectors  is  the  integral 
formalism  introduced  by  Barnett  and  Lothe.  We  generalize  the  matrices  Q,  R,  T  defined 

in  (6)  to 


(22) 


QikW  -  cijksnjns  • 

RikW  =  Cijksnjms  ■ 
.  TikW  “  Cijksmjms  ■ 


in  which  9  is  a  real  parameter  and 


n^  =  (costf,  sintf,  0)  , 
m-  =  (-sin0,  costf,  0)  . 

When  9  =  0,  (22)  reduce  to  (6).  With  Q,  R,  T,  defined  by  (22),  the  three  3x3  matrices 
N.  and  the  6x6  matrix  N  of  (10)  also  depend  on  9.  Equation  (9)  now  becomes 

(23)  N(*)(  =  p(«)(  . 

It  can  be  shown  that  when  PQ(0)  are  distinct,  ( are  independent  of  9.  It  can  also  be 
shown  that  p(9)  is  related  to  p(0)  =  p  of  (9)  by  [8,17j 


_/  a\  _  p  cosfl  —  sin# 

™  '  p  sin0  +  cos# 

(24)  =^{ln  (  cos#  +  p  sin<?)} 


We  now  consider  the  integrals 


(25) 


N3(«)  d«  . 
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When  9  =  r,  the  integrals  in  (25)  are  called  complete  integrals.  Barnett  and  Lothe  [6] 
proved  that  S,  H,  L  of  (19)  are  identical  to  the  complete  integrals 


(26) 


§  =  S(i),  H  =  H(i)  ,  L  =  i(r). 


This  provides  an  alternate  to  the  determination  of  the  three  real  matrices  S,  H,  L.  In  (26) 

the  need  of  determining  the  eigenvectors  are  circumvented  and  hence  the  problem  of  re¬ 
peated  eigenvalue  disappears. 

Equations  (25)  can  be  integrated  explicitly  for  isotropic  materials.  For  9  —  t,  we 

have 


(27) 


'  0 

-s 

0  ' 

s  = 

s 

0 

0 

H  =  - 

0 

0 

0  . 

-  M 

( 1 — s  2 )  /  7  0 

0  (l-s2)/? 

0  0 


0 

0 

1 


0  0 
7  0 
0  1 


s 


l-2v 


7 


in  which  /i  and  v  are  the  shear  modulus  and  Poisson’s  ratio,  respectively.  Complete 
integrals  of  (25)  for  transversely  isotropic  materials  can  found  in  [8]  but  that  for  more 
general  anisotropic  materials  have  not  been  available. 

For  general  anisotropic  materials,  Chadwick  and  Ting  [23)  have  shown  that  S,  H,  L 

have  the  same  structure  as  (27)  for  isotropic  materials  if  a  proper  basis  and  proper  tensor 
components  are  chosen  for  the  tensors  S,  H  and  L.  They  showed  that  the  eigenvalues  of 

S  are  0,  ±is  where  s  is  real  and  positive.  Let  the  associated  eigenvectors  be  e^,  e^  * 

^  where  e-,  i=l,2,3  are  real  vectors  and  let  the  reciprocal  eigenvectors  e1  be  defined 
by 

!i  '  ?j  =  *ij  • 

If  we  choose  the  following  tensor  components  for  S,  H,  L, 


S  =  S^e-e-1  , 

j-i- 

H  =  HlW  , 

-i-j 

L  =  L..eV  , 

ij-  -  ’ 
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it  can  be  shown  that 


(28) 


i  f  0  — s  0  (1— s2 )/T  0  0 ' 

S  ■  =  s  0  0  ,  El]  =  -  =  0  (1— s 2)/r  0 

J  fi  \  /I  I 

0  0  0  J  0  0  1  . 

■  7  0  0  ■ 

Ly  =  /*  0  7  0  . 

[0  0  1 


» 


where  fi,  7,  s  sure  constants.  The  identical  structure  between  this  and  (27)  is  striking.  It 
should  be  noted  that  (28)  for  general  anisotropic  materials  has  only  three  constants  0  <  s 
<  1,  n  >  0  and  7  >  0. 

The  three  matrices  N.  defined  in  (10)  depend  on  the  elastic  constants  in  a  compli¬ 
cated  way,  particularly  so  for  and  N^.  It  is  shown  in  [15]  that  — is  positive  semi- 
definite  and  Nj,  have  the  structure 


in  which  the  * 
reciprocal  of  C 


elements  can  be  expressed  in  terms  of  the  elastic  compliances  which  are  the 
The  fact  that  has  the  property  shown  above  was  crucial  in  solv¬ 


ing  the  problem  of  the  elastic  wedge  subject  to  uniform  tractions  on  the  sides  of  the  wedge 
[24].  Clearly,  the  property  of  and  will  be  useful  also  in  solving  other  anisotropic 


elasticity  problems. 


While  Nj(0),  Ng(0)  do  not  have  the  same  structure  as  N^,  except  at  9  =  0, 
if  we  write 

N*(«)=Q(0)Ni(*)aT(<>), 


?T(0)  =  [?(*).  m(0),  e3]  ,  e3  =  (0,0,1) , 

*  * 

N1(d)  and  Ng(0)  have  the  same  structure  as  and  N^.  This  is  not  surprising  be- 
#  * 
cause  N-(0)  are  N.  referred  to  the  rotated  coordinate  system  x  =  ftx  [25].  It  is  clear 
* 

that  —  Ng(0)  as  well  as  -Ng(0)  are  also  positive  semi-definite. 


NEW  IDENTITIES.  In  many  applications,  the  arbitrary  functions  f  (z  )  in  (11) 
or  (12)  assume  the  same  function  form  for  all  a.  The  simplest  ones  are  power  of  zfl,  i.e., 
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(a  not  summed), 


f  (z  )  =  q  z  , 
a'  a  a 

where  A  and  qfl,  a= 1,2,3,  are  arbitrary  complex  constants.  If  we  define  the  diagonal 
matrix 


7}  =  diag  (zA,  zj,  zA) 


we  may  write  (11)  as 


u  =  2  Re 


=  2  Re 


{AZ-'ql  , 


in  which  the  elements  of  q  are  q^,  q2,  q^-  Replacing  the  complex  constant  q  by  two 
real  constants  g  and  h  through 


we  have 


q  =  ATg  +  BTh  , 


u  =  2  Re  |AZABTJ  h  +  2  Re  {aZAAT}  g  , 
{  =  2  Re  |BZABTJ  h  +  2  Re  {bZAAT}  g  . 


This  form  of  solution  can  be  used  for  analyzing  stress  singularities  in  a  composite.  In  [21], 
the  order  of  stress  singularities  A  at  an  interface  crack  was  obtained  in  closed  form  for 
general  anisotropic  elastic  materials.  With  A  obtained  explicitly  in  closed  form,  one  can 
look  at  the  imaginary  part  of  A  and  study  under  what  combination  of  materials  the  oscil¬ 
lations  in  displacement  near  the  crack  tip  disappears. 

When  A  is  an  integer,  positive  or  negative,  the  quantities  in  the  brackets  in  (29) 
can  be  expressed  explicitly  in  real  form.  Using  (4)  and  (9),  we  have 

z(  =  (xx  -I-  px2)£  =  (XlI  +  x2N)(  , 


"~A  =  (x,  I  + 

BZ'*  B 


T  T 

If  we  post-multiply  both  sides  by  [B  A  ]  and  use  (20),  we  have  the  identity 

AZABT  azaat 

(30)  2  Re  ,  T  1  T  =  (x.  I  +  x„N)A  , 

BZa L  BZ^A 1  L'  L~ 
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which  provides  a  real  expression  for  the  quantities  in  (29)  without  determining  the  eigen¬ 
values  p  and  the  eigenvectors  {. 

Although  (30)  applies  also  for  negative  integer  A,  the  right-hand  side  of  (30)  is  not 
a  very  useful  form  since  it  requires  an  inverse  of  6x6  matrix.  However,  if  we  use  the  polar 
coordinate  system 


x^  =  r  cos #  ,  x2  =  r 

and  employ  the  following  identity  proved  in  [25], 

{cos#  I  -I-  sin#  N}-1  =  {cos#  -  sin#  N(#)}  . 
the  right-hand  side  of  (30)  becomes 

r^{cos#  —  sin#  N(#)}-^  , 

which  is  a  useful  form  for  a  negative  integer  A. 

For  the  wedge  problems  [24,25],  A  =  1  is  used  for  uniform  tractions  applied  on  the 
sides  of  the  wedge  while  A  =  -1  is  used  for  a  concentrated  couple  applied  at  the  wedge 
apex. 

Another  function  form  for  f(z)  which  appears  in  the  problem  of  a  concentrated 
force  and  a  line  dislocation  in  anisotropic  elastic  materials  is 

ffl(za)  =  qfl  In  za  ,  (a  not  summed)  . 


Defining  the  diagonal  matrix 

In  Z  =  diag  [In  z^,  In  Z2,  In  z^]  , 


we  have 
(31) 


u  =  2  Re  |A(ln  Z)BTJ  h  +  2  Re  |A(ln  Z)ATJ  g  , 
(  =  2  Re  {B(ln  Z)BTJ  h  +  2  Re  |B(ln  Z)ATj  g  . 


To  find  the  real  expression  for  the  quantities  in  brackets,  we  first  notice  that 
z  =  r(cos#  +  p  sin#)  , 

and,  using  (24), 

In  z  =  In  r  +  In  (cos#  +  p  sin#) 

=  In  r  +  p(«)  du  . 
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Next,  from  (23)  we  have 


or 

(32) 


II 

N 

c 

(lnr)I  +  xN(0)}  (  , 

A(ln  Z) 

B(lnZ) 

=  j(ln  r)I  +  xN(0)} 

A 

B 

where,  following  (25), 

(33)  N(0)  =  if  N(«)d*  = 


9 


S(9)  H(0) 

-m 


,T  aT, 


Finally,  we  post— multiply  both  sides  of  (32)  by  [B  A  ]  and  use  (20)  to  obtain  the  iden¬ 
tity 


(34) 


2  Re 


A(ln  Z)BT  A(lnZ)AT 
B(ln  Z)BT  B(ln  Z)BT 


=  | (In  r)I  +  rN(0)j 


With  (34),  (31)  can  be  written  as 


(35) 


u=j(lnr)I  +  rS(0)J  h  +  xH(0)g, 

£  =  -xL(  0)h  +  |(  In  r)I  +  tST(^)}  g  - 


The  surface  traction  t^  on  any  radial  plane  9  =  constant  is  determined  by  differ¬ 
entiating  l  with  respect  to  r.  We  have 


(36) 


which  is  independent  of  9.  In  (26],  (36]  derived  from  equations  of  equilibrium  without 
employing  the  stress-strain  laws.  Therefc  c,  1 35)  applies  also  to  composite  spaces  [27]  and 
to  angularly  inhomogeneous  anisotropic  ms  *  us  (26,28).  It  should  be  pointed  out  that, 
although  l  in  (35)  is  not  valid  for  angularly  ^homogeneous  materials,  u  in  (35)  remains 


valid  in  such  materials. 
N(0)  contains 


The  only  modification  required  is  in  (33)  where  the  integrand 


C  jks  which  depend  on  9. 


CONCLUDING  REMARKS.  The  Stroh  formalism  is  elegant  and  powerful.  The 
formalism  is  also  very  effective  in  treating  the  surface  waves  [3,8,29],  Stoneley  waves 
[30,31]  and  waves  in  layered  composites  [32 j.  The  real  matrices  N-(0),  the  incomplete 
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integrals  S(0),  H(0),  L( 9)  and  the  complete  integrals  S,  H,  L,  which  are  the 

Barnett-Lothe  tensors,  appear  often  in  the  solutions  to  anisotropic  elasticity  problems. 

The  striking  simplicity  in  the  structure  of  S,  H,  L  for  general  anisotropic  elastic  materials 

as  shown  in  (28)  is  puzzling.  It  is  believed  that  S(0),  H(0),  L(0),  as  well  as  S,  H,  L  have 

physical  interpretations.  For  L(9),  it  is  shown  in  [24]  that  if  the  stress  in  the  anisotropic 

elastic  material  depends  on  9  only,  the  stress  tensor  is,  with  the  exception  of  the 

component,  proportional  to  L(0). 
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ABSTRACT.  We  examine  the  problem  of  designing  a  homogeneous,  isotropic  elastic 
slab  that  totally  absorbs  an  incident  plane  wave. 

INTRODUCTION.  In  [1]  we  present  a  systematic  method  to  analyze  the  interaction  of 
steady-state,  harmonic  plane  waves  with  a  stratified  elastic  media.  In  this  paper  we  apply  our 
analysis  to  design  a  homogeneous,  isotropic  elastic  slab  that  totally  absorbs  an  incident  plane 
wave  propagating  through  an  adjacent  fluid  half-space.  To  begin,  let  us  consider  a 
homogeneous,  isotropic  elastic  solid  half-space  in  contact  with  a  fluid  half-space.  When  a 
harmonic  plane  wave  travels  through  the  fluid  and  strikes  the  solid- fluid  interface,  the 
propagation  directions  of  the  reflected  and  refracted  waves  are  determined  by  Snell’s  Laws, 
while  the  amplitudes  of  the  reflected  and  refracted  waves  are  determined  from  the  continuity  of 
displacements  and  tractions  at  the  interface.  For  a  solid-fluid  interface,  we  have  no  control 
over  the  reflected  and  refracted  waves;  the  outcomes  are  governed  by  the  fundamental  laws  of 
physics.  But  when  an  elastic  slab  is  inserted  between  the  fluid  and  the  solid  half-spaces,  we 
show  that  the  mechanical  properties  of  the  slab  can  be  chosen  so  that  the  amplitude  of  the 
reflected  wave  vanishes.  The  choice  of  the  mechanical  properties  depends  on  the  frequency 
and  the  angle  of  incidence  for  the  incoming  wave. 

'This  research  wss  supported  by  U.S.  Army  Research  Office  Contract  DAAL03-89-G-0082. 
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NORMAL  INCIDENCE.  In  this  section,  we  consider  the  case  where  the  incident  wave 


is  normal  to  the  fluid-slab  interface.  This  particular  case  can  be  treated  directly,  without 
reference  to  our  earlier  analysis.  On  the  other  hand,  the  results  provide  insight  into  the 
analysis  of  oblique  incidence.  Suppose  that  the  x-axis  is  oriented  perpendicular  to  the  slab  and 
the  slab  occupies  the  region  0  £  x  <  T,  where  x  =  T  is  the  slab-fluid  interface.  The  equation  of 
motion  is 


,  J2v 

P(1)p- 


where  v  =  v(x,  t)  is  the  displacement  at  position  x  and  at  time  t.  We  assume  that 


*00  =  K, 

and 

p(x)  = 

Pi 

for 

x  >  r. 

K(x)  =  K 

and 

n 

- V 

Q. 

P 

for 

0  <x<T, 

5? 

II 

£ 

and 

II 

— S 

Ol 

Po 

for 

x  <  0. 

Assuming  harmonic  time  dependence  and  a  unit  amplitude  for  the  incident  wave,  the  general 
solution  of  the  equation  of  motion  has  the  form  v(x,t )  =  u(x)emi  where  co  is  the  wave 
frequency, 


u(x) 

_  (x-T)  +  ^-idWjC x-T) 

for 

x  >  T, 

u(x) 

=  x+eit,wx  +  x_e-i(0“ 

for 

0  £x  ST, 

U(x) 

=  xe"*3*1 

for 

x<0. 

Here  the  slowness  parameters  sx,  s,  and  s0  are  defined  by 

Sy  =  VP1/K1  .  S  =  Vp/ic ,  and  s0  =  VPo/Kq  . 


(The  slowness  is  the  reciprocal  of  the  wave  speed.)  The  amplitudes  r,  x,  x+,  and  t_  can  be 
determined  from  the  continuity  of  displacement  v  and  stress  Kdv/dx  at  the  interfaces  x  =  0  and 
x  =  T.  Altogether,  there  are  four  equations  of  continuity: 
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X  =  T+  +  T_ 


1  +  r  =  x+ei<MT  +  x_e-i(0lT 

J0KoX=  KS(X+-X_) 

$iKi(l  -  r)  =  Ks(x+e‘aMr  -  i.e-10”7) 


Solving  these  equations  for  r  and  setting  r  =  0  yields  the  relation 


(1) 


qi  ~  q  =  qo  ~  q  .-2io»r 
Oj  +  a  Oq  +  o 


where 


a,  =  VkiPi.  o  =  Vkp,  and  o0  =  VkqPo- 


Since  the  mechanical  parameters  arc  all  real,  equation  (1)  only  holds  v-hen  the  exponential  term 
is  +1  or  - 1.  Hence,  there  are  two  cases  to  consider 

Case  1.  <r2,“ur  =  -l. 

In  this  case,  the  exponent  2co sT  is  an  odd  multiple  of  n.  In  other  words,  co sT  =  (m  +  lA)n 
for  some  integer  m,  or,  equivalently, 

(2)  (orVp/K  =  (m+'A)  it. 

Substituting  -1  for  the  exponential  term  in  (1)  gives  o2  =  OoO^  or,  equivalently, 

(3)  Kp  =  VKbPoVK^. 

Thus  the  impedance  Vi qT  of  the  slab  is  the  geometric  mean  of  the  impedances  of  the  half¬ 
spaces  it  separates.  Together,  equations  (2)  and  (3)  determine  the  ratio  p/tc  and  the  product 
pic.  Therefore,  they  determine  a  unique  p  and  ic  for  each  choice  of  the  integer  m  in  (2). 


87 


Case  2.  e"2i0Mr  =  +l. 

In  this  case,  the  exponent  2co sT  is  an  even  multiple  of  rc,  which  implies  that 
(4)  (tisT  =  (aT  Vp/K  =  m  it 

for  some  integer  m.  Again,  this  equation  restricts  the  slowness  to  a  countable  set  of  discrete 
values.  However,  when  we  substitute  +1  for  the  exponential  term  in  (1),  we  see  that  ct0  = 
That  is,  this  case  occurs  only  when  the  materials  in  the  two  half-spaces  have  the  same 
impedance.  On  the  other  hand,  if  the  impedances  match,  then  for  each  integer  m,  there  is  a  1- 
parameter  family  of  slab  materials,  with  slowness  given  by  (4),  that  totally  absorbs  the 
incoming  wave. 

OBLIQUE  INCIDENCE.  Now  let  us  consider  a  plane  wave  uvai  strikes  the  .>01.0-11010 
interface  at  an  oblique  angle,  generating  reflected  and  transmitted  (refracted)  waves.  Again,  we 
will  show  that  the  material  in  the  slab  can  be  chosen  to  annihilate  the  reflected  wave. 

To  begin,  we  briefly  review  wave  propagation  in  homogeneous,  isotropic  materials.  Let 
p  denote  the  density,  and  p  and  k  denote  the  Lame  moduli  of  a  homogeneous,  isotropic 
linearly  elastic  material.  If  p  >  0  and  2p  +  k  >  0,  then  exactly  two  types  of  waves  propagate  in 
the  elastic  media:  dilatational  waves,  in  which  the  directions  of  displacement  and  propagation 
coincide,  and  shear  waves,  in  which  the  directions  of  displacement  and  propagation  are 
orthogonal  to  each  other.  Let  cd  and  c,  denote  the  dilatational  and  shear  wave  speeds  defined 
by 


and  let  D  and  S  denote  the  dilatational  and  shear  slowness  given  by 

D  =  Mcd  and  S  =  1/c,. 
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For  any  unit  vector  d,  the  expression  v(x,r)  =  d/(r-Dx*d)  defines  a  plane  dilatational  wave 
which  formally  satisfies  the  equation  of  motion.  Similarly,  given  two  unit  vectors  s  and  p 
where  s*p  =  0,  the  expression  v(x,r)  =  pg(r-Sx*s)  defines  a  plane  shear  wave  which  formally 
satisfies  the  equation  of  motion.  The  functions  /  and  g  are  called  the  wave  profiles,  the  vectors 
d  and  s  are  the  propagation  vectors,  and  the  shear  wave  is  said  to  be  polarized  in  the  direction 
p.  Throughout  this  paper,  we  consider  harmonic  waves;  in  principle,  waves  of  more  general 
form  can  be  synthesized  by  the  superposition  of  harmonic  waves.  The  motion  of  harmonic 
waves  is  described  by  the  real  or  the  imaginary  parts  of  the  expressions 

v(x,r)  =  8de1(0('"£>d'x)  and  v(x,r)  =  cpe10*'-5*'^  . 

Consider  a  plane  interface  I  separating  two  distinct  half-spaces  of  homogeneous,  isotropic 
elastic  materials.  A  dilatational  wave  striking  the  interface  typically  generates  a  reflected 
dilatational  wave,  a  reflected  shear  wave,  a  refracted  dilatational  wave,  and  a  refracted  shear 
wave.  Similarly,  a  shear  wave  striking  the  interface  typically  generates  waves  of  all  four  types. 
Therefore,  when  a  combination  of  dilatational  and  shear  waves  impinges  upon  the  interface, 
eight  different  waves  are  generated  altogether.  The  plane  formed  by  the  propagation  vector  of 
an  incident  wave  and  the  normal  to  the  interface  /  is  called  the  plane  of  incidence  for  the  wave. 
The  propagation  vectors  of  the  outgoing  waves  are  determined  by  a  set  of  equations  known  as 
Snell's  Laws  which  we  state  as  follows: 

The  propagation  vectors  dT  and  s,.  for  a  reflected  wave  and  the  propagation 
vectors  dt  and  St  for  the  transmitted  wave  lie  in  the  plane  of  incidence  for  the 
incoming  wave.  Moreover,  if  m  is  a  unit  vector  in  the  intersection  of  the 
interface  and  the  plane  of  incidence,  then  for  an  incident  dilatational  wave  with 
propagation  vector  d,  we  have 

(5)  Dd-m  =  Ddr-m  =  Ss^m  =  Dt  d^m  =  Stvm, 
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and,  for  an  incident  shear  wave  with  propagation  vector  s,  we  have 
(6)  S  s-m  =  Ddr-m  =  Ss^m  =  Dtd(-m  =  S^-m. 

Given  the  unit  propagation  vectors  d  and  s  of  the  incident  waves,  equations  (5)  and  (6) 
determine  the  propagation  vectors  of  the  corresponding  scattered  waves.  Note  that  if  a  pair  of 
incident  dilatationai  and  shear  waves  share  a  common  plane  of  incidence  and  if  they  satisfy  the 
relation  Dd*m  =  Ss*m,  then  the  two  reflected  waves  have  the  same  direction  as  do  the  two 
refracted  waves.  In  other  words,  there  are  four  rather  than  eight  outgoing  waves.  A  pair  (d,  s) 
of  incident  waves  which  lie  in  the  same  plane  of  incidence  and  which  satisfy  the  relation 
Dd-m  =  Ss-m  will  be  called  a  conjugate  pair  of  waves.  Note  that,  when  a  wave  strikes  an 
interface  between  two  homogeneous  materials,  both  the  reflected  and  the  transmitted  waves 
form  conjugate  pairs. 

Let  us  now  consider  a  homogeneous  elastic  slab  of  thickness  T  separating  an  isotropic 
fluid  to  the  right  of  the  slab  from  a  homogeneous,  isotropic  elastic  material  to  the  left  of  the 
slab.  The  ratio  of  the  amplitudes  of  the  reflected  and  incident  waves  is  the  reflectivity  of  the 
slab.  In  the  paper  [1],  we  obtain  a  formula  for  the  reflectivity  in  terms  of  a  local  impedance 
tensor.  Suppose  that  a  conjugate  pair  of  waves  have  propagation  directions  d  and  s  and 
polarization  direction  p  contained  in  the  plane  of  incidence  for  an  elastic  material.  Viewing  d 
and  p  as  2-dimensional  vectors  in  the  plane  of  incidence,  we  define  2x2  matrices 

A  =  [dip]  and  B  =  [  D  {2p(d-n)d  +  Xn}  I  Sp((s-n)p  +  (p-n)sj  ]. 

Then  the  local  impedance  tensor  H  is  given  by  H  =  BA-1. 

Let  H  denote  the  local  impedance  tensor  of  the  slab,  Hq  the  local  impedance  tensor  of  the 
left  half-space,  n  the  normal  to  the  slab-fluid  interface  (pointing  into  the  slab)  and  d,  and  D, 
the  propagation  direction  and  slowness  of  the  incident  wave.  We  regard  the  fluid  as  a 
degenerate  elastic  solid  with  Lame  moduli  jit  =  0  and  X,  >  0.  By  Lemmas  4.1  and  5.1  in  [1], 
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the  reflectivity  r  of  the  slab  can  be  expressed  as 


nrdj  -  X,Djnrrn 
nrdj  +  X,Dinrrn 


where 


(7)  T  =  [I  +  L][H  -  PHPLf1.  L  =  PAAA-’PtPHP+HoF'tH  -  HoJAAA"1, 


P  =  I  -  2nnr,  and  A  = 


e-iti)DTd-n 


0 


0 


g-iaiSTs-n 


Again,  T  denotes  the  thickness  of  the  slab,  D  and  S  are  the  dilatational  slowness  and  shear 
slowness  of  the  slab,  and  to  is  the  frequency  of  the  incident  wave. 

Now  consider  the  problem  of  choosing  the  slab  material  in  order  to  annihilate  the 
reflected  wave.  Observe  that  the  reflectivity  is  zero  if  and  only  if 


(8) 


nrrn 


nrd| 

Xi£>i 


Since  the  right  side  of  this  equation  is  real,  the  left  side  must  be  real,  also.  The  only  way  that 
complex  numbers  enter  T  is  through  the  diagonal  matrix  A,  which  appears  as  two  factors  of  L. 
Let  a  -  oiDTd*  n  and  b  -  coSTs -n  be  the  parameters  that  appear  in  the  exponents  on  the 
diagonal  of  A.  In  order  to  ensure  that  L  is  real,  we  must  choose  a  and  b  such  that 

e-2“  =  ±1,  e-2i*  =  ±1,  and  e^(a+b)  =  ±1. 


Hence,  either  a  -  mn  and  b  =  nn,  or  a  =  (m+Vi) ji  and  b  =  (n+V i)rc  for  integers  m  and  n. 
Defining  the  matrix  J  by 


-1  0 
0  1  ’ 
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there  are  essentially  4  distinct  A,  corresponding  to  different  choices  of  m  and  n,  that  we  need 
to  consider 


A  =  1, 
A  =  J, 
A  =  il, 
A  =  LJ, 


a  =  miz,  b  =  nit,  m  and  n  even, 

a  ~mn,  b  =  nit,  m  odd  and  n  even,  or  m  even  and  rt  odd, 

a  -  (m + ‘A)Jt,  b  =  (n  +  m  and  n  even, 

a  -  (m + 'A)jc,  6  =  (n+  Vi)jc,  m  odd  and  /i  even,  or  m  even  and  /i  odd. 


Each  of  these  cases  will  be  analyzed  in  the  following  sections.  When  studying  normal 
incidence,  we  saw  that  there  was  one  “degenerate”  case  in  which  total  absorption  was  only 
possible  if  the  impedances  of  the  left  and  right  half-spaces  were  identical  For  oblique 
incidence,  the  degenerate  case  is  A  =  I. 


THE  CASE  A  =  I. 

Lemma  1.  If  A  -  1,  then  r  -  0  if  and  only  if 


Proof.  Observe  that  L  =  [PHP+Hor'[H-H0]  when  A  =  I.  Hence,  the  second  factor  in 
the  definition  of  T  can  be  written 

H  -  PHPL  =  H  -  PHP[PHP+Horl[H-Ho] 

=  H  -  (PHP  +  H0  -  H0)[PHP + H0]-1  [H  -  H0] 

=  HoCl  +  IPHP+Hor’lH-Ho]) 

=  Ho(I  +  L). 

Referring  to  the  definition  of  T,  we  see  that  T  =  H^1.  Equation  (8)  completes  the  proof.  Q 

When  A  =  I,  the  incoming  wave  is  absorbed  only  if  the  elastic  material  in  the  left  half¬ 
space  satisfies  the  special  condition  given  in  Lemma  1.  On  the  other  hand,  if  the  condition  of 
Lemma  1  is  satisfied,  there  is  a  1-parameter  family  of  slab  materials  that  annihilates  die 
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reflected  wave.  In  particular,  any  material  that  satisfies  the  conditions 


coDTd'n  =  mrc  and  caSTs-n  =  nn 


where  m  and  n  are  even  integers  annihilates  the  reflected  wave.  Given  the  angle  of  the 
incident  wave,  the  expressions  d*n  and  s*n  can  be  evaluated  using  Snell’s  Laws.  Omitting  the 
algebra,  it  follows  that  the  incident  wave  is  absorbed  totally  when 


(9) 


M-  ~  P Srt’  ^  PC?m  gH  — 


2t2 


(0  lT 


nV  +  (a>TD ,  sin  aj 


where  at  is  the  angle  of  the  incident  wave  relative  to  the  normal  to  the  slab  interface  and  m 
and  n  are  even  integers.  Treating  the  wave  frequency  (0,  the  slab  thickness  T,  and  the  angle  of 
incidence  a!  as  constants,  there  is  a  1 -parameter  family  of  perfect  absorbers,  with  Lame 
moduli  p.  and  X  given  by  (9)  in  terms  of  the  parameter  p,  corresponding  to  each  pair  of  even 
integers  m  and  it. 


THE  STRUCTURE  OF  T.  In  order  to  analyze  the  other  choices  of  A,  it  helps  to  see 
how  H  depends  on  p.  From  the  definition  of  a  and  b. 


Cd 


oard«n 

a 


and  cs 


d)7s-n 

b 


Also,  by  Snell’s  Laws,  we  have 


cdl  d-m 

cd  =  — -  and  cs 

d]  'in 


d, -m 


Hence,  the  tangent  of  the  angles  a  and  P  (relative  to  the  interface  normal  n)  of  the  dilatational 
and  shear  waves  transmitted  in  the  slab  are  determined: 
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Finally,  the  wave  speeds  are  given  by 


crf,sin  a 

cd  -  — -  and  c, 

dj  •  m 


c^sin  3 
d|  *  m 


In  [1]  we  present  an  explicit  formula  for  H  relative  to  a  rectangular  coordinate  system 
with  n  pointing  along  the  positive  X\  axis  and  with  x2  in  the  interface  between  the  fluid  and 
the  slab.  Relative  to  the  geometry  of  Figure  1 ,  we  have  H  =  pH  where 


H 


cos  (a-P) 


0cos  p 

sin  (a-2P) 


-sin  (a-2P) 
cos  a 


6 


cdlc,, 


and  a  and  p  are  given  by  (10).  In  summary,  for  fixed  a  and  b,  H  is  a  linear  function  of  the 


density  p  of  the  slab.  Observe  that  H  is  invertible  whenever  cd  >  0  and  c,  >  0.  Moreover. 
H"1  =  p"1H"1  where 

cos  a  sin  (a -2(3) 

-sin  (a -2(3)  <j>cos(3 


H"1  = 


cos  (a-&) 


Ca cos  Scos  a  +  c.sin2(a-2B) 


and  0  =  cjct . 


LARGE  p.  Let  us  now  determine  the  limiting  behavior  of  T  as  p  tends  to  infinity.  Since 
H  depends  linearly  on  p,  we  see  from  (7)  that  each  element  of  T  is  a  rational  function  of  p; 
that  is,  each  element  of  T  is  the  ratio  of  two  polynomials.  Recall  that  a  rational  function  is 
either  constant  (independent  of  p),  or  it  has  a  finite  number  of  zeros  and  poles.  We  already 
discovered  one  case  where  this  rational  function  is  completely  constant  -  when  A  =  I,  F  = 
H0"'  independent  of  p.  However,  in  general  T  depends  on  p.  Since  H"‘H0  approaches  zero 
as  p  tends  to  infinity,  the  limit  of  L  as  p  tends  to  infinity  is  easily  evaluated: 

lira  L  =  L  where  L  =  P  AAA~1BTi  PH  AAA' 1 . 

p  -*  “ 

Referring  to  (7),  r  has  the  following  asymptotic  fotm  as  p  tends  to  infinity: 

(11)  T  =  p-‘F  +  0(p-2)  where  f  =  (I  +  L)(H  -  PHPL)"'. 

Note  that  the  formula  (11)  only  makes  sense  when  the  quantity  H  -  PHPL  is 
nonsingular.  In  particular,  for  the  special  case  A  =  I,  we  have 

H  -  PHPL  =  0, 

which  explains  why  the  asymptotic  limit  (11)  is  incorrect  when  A  =  I.  However,  for  the  other 
choices  of  A,  the  quantity  H  -  PHPL  is  generally  nonsingular.  In  particular,  in  the  case  A  = 
il,  it  is  readily  verified  that 
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H-PHPL  =  2H, 


which  is  nonsingular  since  H  is  nonsingular.  For  the  case  A  =  J,  let  us  employ  the  coordinate 
system  depicted  in  Figure  1.  In  this  coordinate  system,  P  is  equal  to  J  and  the  matrix 
H  -  PHPL  is  a  nonsingular  multiple  of  the  expression 


BJB-1  -  JBJB_lJ, 


where  B  is  the  matrix  that  appears  in  the  definition  of  the  impedance  tensor.  H  =  BA 
Similarly,  taking  A  =  ij,  it  follows  that  H  -  PHPL  is  a  nonsingular  multiple  of  the  expression 


BJB-1  +  JBJB_IJ. 


Focusing  on  BJB  1  ±  JBJB  'j,  we  have 
Lemma  2.  Given  a  nonsingular  matrix 


-b  d\' 


the  expression  BJB-1  -  JBJB"lJ  is  nonsingular  if  and  only  if  a,  b,  c,  and  d  are  nonzero.  The 
expression  BJB-1  +  JBJB_1J  is  nonsingular  if  and  only  if  be  *  ad. 

Proof  This  is  verified  by  evaluating  the  determinants: 


det  (BJB-1  -  JBJB~1J)  = 


16 abed 


det  (BJB"1  +  JBJB-’J)  =  '-(f  a 


In  [1]  we  provide  the  following  representation  of  B  for  the  geometry  depicted  in  Figure 
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(2p + X)  cos  2(3  <t»n  sin  2p 
-p.  sin  2a  4>)xcos  2(i 

where  <J>  =  cjct.  First  let  us  consider  the  case  A  =  J  so  that  H  -  PHPL  is  nonsingular  if  and 
only  if  every  element  of  B  is  nonzero.  Assuming  the  angle  of  incidence  is  not  normal  to  the 
slab  (nonmal  incidence  was  studied  earlier),  the  (1,2)  and  (2, 1)  elements  of  B  are  nonzero. 
Although  the  (1, 1)  and  (2,2)  elements  of  B  are  zero  if  (3  =  rt/4,  an  infinitesimal  perturbation  in 
the  thickness  or  the  frequency  yields  (3  *  re/4  and  H  -  PHPL  invertible.  In  the  case  A  =  ij,  it 
follows  from  Lemma  2  that  H  -  PHPL  is  singular  if  and  only  if 

(2p  +  X)  cos  2  2(3  =  psin  2a  sin  2p. 

Utilizing  (10).  this  relation  is  equivalent  to 

(12)  P  =  arc  tan  Vy  +  1  ±  Vy2  +  2y  where  y  =  lalb. 

Referring  to  (10).  we  see  that  there  are  special  values  for  the  frequency  and  thickness  that  lead 
to  singularity;  but  again,  an  infinitesimal  perturbation  of  T  or  a)  restores  invertibility.  We  say 
that  the  slab  is  singular  if  either  A  =  I,  A  =  J  and  P  =  ji/4,  or  A  =  ij  and  p  satisfies  (12). 

In  summary,  when  the  slab  is  nonsingular,  T  approaches  (asymptotically)  r/p  as  p 
increases.  In  particular,  T  tends  to  zero  as  p  increases. 

SMALL  p.  Let  us  consider  a  nonsingular  slab  and  the  geometry  depicted  in  Figure  1 .  In 
this  case,  nrHi  equals  the  (1, 1)  element  of  T  which  we  denote  y.  Since  T  is  a  rational 
function  of  p,  y  is  a  rational  function  of  p  that  tends  to  zero  as  p  increases.  In  a  separate 
paper,  we  will  show  that  y  >  0  for  every  choice  of  the  density.  Consequently,  y  has  no  poles 
along  the  positive  real  axis,  and  we  can  satisfy  (8)  for  some  p  whenever 
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0  ^ 


nTd1 


^  Yo. 


where  Yo  denotes  the  limit  of  y  as  P  tends  to  zero  (y  approaches  zero  as  p  becomes  large,  y 
approaches  Yo  as  P  tends  to  zero,  and  y  depends  continuously  on  p).  Thus  the  value  of  Yo 
provides  insight  concerning  the  incident  waves  that  can  be  absorbed. 

We  have  evaluated  Yo  for  each  of  the  choices  A  =  J,  A  =  il,  and  A  =  U.  It  turns  out  that 
the  evaluation  of  Yo  is  quite  difficult  since  very  complicated  trigonometric  matrices  must  be 
multiplied  together  and  simplified.  With  the  assistance  of  a  symbolic  manipulation  package, 
we  found  that  for  p  near  0  and  for  each  choice  of  A,  T  has  an  expansion  of  the  form: 

T  =  Tp-t  +  SrHo‘S  +  O(p) 


where  the  T  and  S  corresponding  to  the  various  choices  of  A  appear  in  Table  1. 

Observe  that  in  each  case,  the  (1,1)  element  of  T  is  zero.  Thus  for  each  choice  of  A,  y0 
is  the  (1, 1)  element  of  StHq  !S,  which  is  easily  evaluated: 


Lemma  3. 


Yo 

Yo 

Yo 


(Ho1),! 

(1  -2  cos  2p)2 
(H^!)acos2a 
sin2(a-2P) 

(Ho 1  >22  cos 2  a  cos 2  (a  -  P) 
(sin  acos  (a+P)  +  cos  Psin  2p) 


when  A  =  J, 
when  A  =  il, 
when  A  =  ij. 


For  a  nonsingular  slab  and  for  each  choice  of  A,  there  exists  a  value  of  p  that  absorbs  the 
incident  wave  whenever 


(13) 


0  S 


nrdj 

XjDj 


£  Yo- 


Finally,  let  us  verify  the  claim  made  at  the  beginning  of  this  paper  concerning  the 
existence  of  a  material  that  totally  absorbs  any  given  incident  wave. 
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Table  1.  T  and  S  for  various  choices  of  A. 


99 


Theorem  1.  For  any  given  incident  wave,  the  mechanical  properties  of  the  slab  can  be 
chosen  so  that  the  amplitude  of  the  reflected  wave  is  zero. 

Proof.  By  equation  (10),  the  angles  a  and  (J  can  be  made  arbitrarily  close  to  zero  by 
taking  m  and  n  sufficiently  large.  By  Lemma  3,  Yo  tends  to  infinity  as  a  and  p  tend  to  zero  if 
A  =  il.  Also,  by  (11)  Y  tends  to  zero  as  p  increases  when  A  =  il.  Since  the  inequality  (13) 
holds  for  m  and  n  sufficiently  large,  there  exists  a  density  for  which  the  slab  totally  absorbs  the 
incident  wave.  ^ 

NUMERICAL  EXPERIMENTS.  The  inequality  (13)  provides  a  lower  bound  on  the 
range  of  incident  angles  and  frequencies  that  can  be  absorbed.  Although  the  range  of  y  as  a 
function  of  p  contains  the  interval  [O.Yol*  potentially  the  range  extends  outside  the  interval. 
Experimentally,  we  find  that  y  is  nearly  a  monotone  function  of  p  so  that  the  interval  [0,  y0] 
accurately  predicts  the  incident  waves  that  can  be  absorbed.  Figures  2,  3,  and  4  show  typical 
plots  of  Y  as  a  function  of  p  for  various  choices  of  A.  These  graphs  correspond  to  a  material 
like  steel  in  the  left  half-space  and  the  fluid  water  in  the  right  half-space.  In  particular,  the 
following  mechanical  parameters  were  employed: 

Right  half-space:  p;  =  1  gm/cm3,  c^  =  140000  cm/sec,  X,  =  p, cd\ 

Left  half-space:  p0  =  7  gm/cm3,  =  X.i,  jXq  =  Xq/3. 

Slab:  7=5  cm. 

Incident  wave:  oti  =  30  degrees,  to  =  600n  rad/sec. 

Since  the  graphs  in  Figures  2,  3,  and  4  appear  monotone,  the  range  of  y  is  accurately 
estimated  by  the  interval  [0,y0].  (Note  though  that  the  numerical  values  of  y  'n  Figure  4 
deviate  from  monotonicity  in  the  fourth  significant  digit  near  p  =  0,  a  deviation  that  is 
imperceptible  to  the  eye,  but  which  is  large  enough  to  undermine  any  proof  of  monotonicity 
for  y.)  In  Figure  5  we  plot  the  density  of  the  material  that  totally  absorbs  the  incoming  wave 
versus  the  angle  of  incidence.  Observe  that  as  the  angle  of  incidence  approaches  90  degrees 
(with  the  wave  speed  fixed),  the  density  tends  to  infinity.  In  Figure  6  we  plot  the  density  of 
the  material  that  totally  absorbs  the  incoming  wave  versus  the  wave  frequency.  Observe  that 
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as  frequency  tends  to  zero  (with  the  wave  speed  fixed),  density  tends  to  infinity,  and  as 
frequency  tends  to  infinity,  density  tends  to  zero. 

Numerically,  we  investigated  singular  slabs  associated  with  A  =  J  and  A  =  ij.  We  found 
that  y  was  equal  to  Yo.  independent  of  p  (for  fixed  wave  speed). 
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Figure  2.  y  versus  density  for  A  =  J,  m  =  1,  and  n  =  2. 
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ABSTRACT 

A  shear  block  approach  has  been  used  to  model  the  transient  shear  response  of  a  rigid  block-flexible  support  system 
subjected  to  a  nonpenetrating  side-on  hypervelocity  projectile  impact.  The  initial  velocity  imparted  to  the  shear 
block  due  to  impact  has  been  calculated  using  a  momentum  balance  between  the  projectile  and  the  rigid  block  and 
has  been  imposed  as  an  initial  condition  for  the  transverse  dynamic  equation  of  motion  of  the  block-support  assembly. 
The  spring  constant  of  the  support  has  been  evaluated  based  on  the  support  height  and  the  shear  area.  The  forcing 
function  has  been  computed  assuming  that  the  entire  projectile  h  is  consumed  at  a  constant  rate  in  a  finite  length 
of  time  and  a  triangular  force-time  relationship  is  impose.'  on  the  system.  The  nonhomogeneous  transverse  equation 
of  motion  for  the  assembly  is  solved  for  trar  verse  displacement,  velocity  and  acceleration  and  the  constants  for  the 
complimentary  and  the  particular  part  of  the  solution  are  evaluated  using  a  set  of  initial  and  boundary  conditions. 
The  displacement  solution  is  optimued  by  setting  the  velocity  equal  to  zero  and  obtaining  a  peak  response  time 
at  which  the  displacement  is  an  optimum.  The  acceleration  at  this  time  is  found  to  be  negative  ensuring  that  the 
solution  for  displacement  is  a  global  maximum.  Once  the  peak  transverse  displacement  of  the  block-support  system 
is  known,  peak  shear  stress  and  strain  can  be  easily  calculated  and  compared  to  the  shear  yield  stregth  of  the  parent 
material  in  order  to  ensure  the  structural  integrity  of  the  system  from  a  shear  strength  standpoint  or  predict  the 
occurrence  of  dynamic  shear  failure  of  the  assembly  at  the  interface  between  the  block  and  the  support 


INTRODUCTION 

The  capability  to  predict  the  effect  of  hypervelocity  impact  of  a  missile  upon  a  rigid  or 
deformable  structure  is  a  necessity  as  a  first  step  towards  the  design  and  safe  operation  of 
nuclear  reactors  (1,2)  as  well  as  defense  systems  subjected  to  extreme  environments.  This 
problem  is  also  of  considerable  interest  to  the  Ballistic  Research  Laboratory  (BRL)  due  to 
possibility  of  sustaining  severe  damage  at  a  vulnerable  location  of  the  target  structure  when 
impacted  by  a  projectile  at  a  specific  angle  of  obliquity. 

A  number  of  studies  have  been  performed  and  damage  data  gathered  (3-7)  over  the  years. 
However,  most  data  available  are  in  the  form  of  impulse  correlation  curves  and  crater  shapes 
in  plates  due  to  slender  rods  while  relatively  little  has  been  reported  in  terms  of  dynamic 
stress-strain  response  of  multibody  systems  consisting  of  interconnected  rigid  and  deformable 
bodies  subjected  to  impact  and  sudden  change  in  the  structure. 
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Recently,  computation  using  hydrodynamic  codes  (8-10)  has  been  reported.  Unfortu¬ 
nately,  setting  up  an  accurate  computational  model,  code  computation  and  assimilation  as 
well  as  correct  interpretation  of  the  results  are  expensive  and  time  consuming  and  require 
considerable  expertise  for  the  project  leader.  Because  of  limited  time  and  cost  constraints 
it  was  decided  early  on  to  resort  to  a  feasible  analytical  approach  in  lieu  of  a  numerical 
approach  which  will  eliminate  undue  complexities  of  the  real  problem  while  retaining  the 
essential  features  of  the  loading  process  and  giving  an  insight  into  the  impact  phenomena, 
the  dominating  stress  and  failure  mechanisms. 

IMPACT  CONDITION 

Let  us  assume  that  a  large  projectile  of  mass  Mi  travelling  with  an  initial  velocity,  Vj, 
in  a  horizontal  direction  collides  with  a  stationary  massive  object  of  mass  M2  supported 
underneath  by  a  series  of  plates  which  in  turn  are  connected  to  an  even  larger  mass  by 
means  of  continuous  double  seam  welding.  Because  of  the  nature  of  these  masses  and  type  of 
construction,  shear  phenomena  appears  to  dominate  stresses  and  failure  in  such  structures 
rather  than  bending  which  is  the  governing  mechanism  in  mass-beam  coupled  systems. 

Assuming  the  target  to  be  rigid  and  a  constant  average  deceleration  rate  upon  impact 
based  upon  a  linear  decay  of  velocity  from  V\  to  a  zero  velocity  as  well  as  an  average  duration 
time  to  consume  the  total  length  of  the  impactor,  it  is  possible  to  calculate  a  linearly  decaying 
forcing  function  with  a  triangular  equivalent  impulse  which  can  be  imposed  upon  the  rigid 
mass  in  a  side-on  horizontal  direction  such  that 

Fv  =  Mi  (Vi-  0)/(Ti-  Tt)  =  M1V1/T  (1) 

where  T  is  the  duration  time  and  Fp  is  the  decelerating  force. 

Invoking  a  momentum  balance  between  the  impactor  and  the  target  mass  which  is  now 
allowed  to  move  in  a  horizontal  direction,  it  is  possible  to  compute  the  imparted  fined  velocity 
of  the  tax  get  as  follows: 

M1V1  =  V2(Mi  +  M2) 

or,V2  =  MiVi/(Mi  +  M2)  (2) 

where,  Mi  is  the  impacting  mass  with  an  initial  velocity  of  V\  and  M2  is  the  target  mass. 
Once  the  imparted  velocity  of  the  target  mass  is  known  it  can  be  imposed  as  a  constraint 
condition  for  the  equation  of  motion  to  solve  the  boundary  value  problem. 

PROBLEM  FORMULATION 

Prior  to  shear  stress  computation  it  is  necessary  to  obtain  the  dynamic  equation  of  motion 
of  the  target-support  assembly  in  the  form: 

M2x  +  Kx  =  F(t)  (3) 

where  F(t)  is  the  externally  applied  force  upon  the  target,  K  is  the  support  stiffness  and 


x  is  the  horizontal  displacement  of  the  target  as  shown  in  Figure  1.  From  this  figure  shear 
strain,  y,  at  the  block  interface  and  shear  stress,  T,  can  be  given  as 


y  =  x/h 

T  =  Gy  (4) 

where  h  is  the  height  of  the  shear  block  support  and  G  is  the  shear  modulus. 

The  spring  constant  for  the  support  is  evaluated  by  referring  to  the  free  body  diagram  as 
shown  in  Figure  2  where  Fafl  is  the  shear  force  at  the  interface  given  as 


Fak  =  2TA  =  2  Gy  A  =  2  GA(x/h)  =  kx 
k  =  2  GA/h  (5) 

where  A  is  the  shear  area  at  the  interface  between  the  block  and  the  plate.  The  equation 
of  motion  of  the  block  support  system  could  be  rewritten  as 

M2x  +  2C  Ax/h  =  F(t)  (6) 


METHOD  OF  SOLUTION 

The  dynamic  equation  of  motion  of  the  block  support  assembly  subjected  to  a  horizontal 
side-on  impact  load  as  given  in  the  previous  section,  needs  to  be  solved  for  the  time  dependent 
displacement,  x,  subjected  to  the  constraint  conditions  that  initial  displacement  is  zero  at 
time  t  =  0  when  initial  velocity  is  V2  which  is  the  initial  target  velocity  obtained  earlier  from 
invoking  the  momentum  balance. 

The  forcing  function,  F,  could  now  be  assumed  to  be  a  triangular  force-time  curve  with 
linear  decay  in  the  form 


F  =  Fp[l  -  (/!„] 


(V 


where  Fp  is  the  peak  impact  force  and  tp  is  the  positive  phase  duration.  At  t  =  0,  F 
reduces  to  Fp  and  at  t  =  fp,  F  vanishes  which  satisfies  the  initial  constraint  conditions. 


The  equation  of  motion  could  now  be  rewritten  as 


M2x  +  2GAx/h  =  Fp[  1  -  t/tp] 

The  solution  of  above  equation  of  motion  can  be  expressed  as  a  sum 

x(t)  =  xc{t)  +  xp(t) 

where  xc(t)  is  the  complimentary  solution  satisfying  the  homogeneous  equation 

x  +  2GAx/(M2h)  =  0 

and  xp(t)  is  the  particular  solution  satisfying  the  nonhomogeneous  equation 


(3) 

(9) 

(10) 
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X  +  2GAx/{M2h)  =  Fp[l  -  t/tp]/M2 


(11) 


A  complimentary  solution  for  the  standard  homogeneous  equation  above  can  be  given  as 

xc(t)  =  Acos(wt )  +  Bsin(wt)  (12) 

where  A  and  B  axe  constants  to  be  evaluated  from  the  initial  and  boundary  conditions  and 
w  is  calculated  as 


to  =  2GA/{M2h)  (13) 

Similarly,  following  the  procedures  outlined  above  with  some  modification  for  the  non- 
homogeneous  part  of  the  equation  of  motion,  a  particular  solution  could  be  obtained  as  a 
function  of  the  peak  load,  target  mass,  plate  stiffness,  positive  phase  duration  and  the  elapsed 
time  as  shown  below  : 


*>(<)  =  [F,K^7Mt)\[i  -  t/t„\ 


(14) 


Hence  the  total  solution  for  displacement  of  the  shear  block  in  a  horizontal  direction  is 
given  as 


x(t)  =  Acos(wt )  +  Bsin(wt )  +  [Fp(l  -  ( t/tp))/(w2M2 )]  (15) 


Once  displacement-time  history  of  the  impacted  structure  is  known  it  is  possible  to  obtain 
velocity  of  the  block  in  a  horizontal  direction  by  differentiating  the  above  equation  with 
respect  to  time  which  results  in 

x(t)  =  Bwcos(wt)—  [Au;sm(tyt)  +  Fp/ (w2  M2tp)\  (16) 

where  A  and  B  axe  constants  evaluated  from  initial  and  boundary  conditions  for  the 
problem. 

Accelaxation-time  relationship  for  the  target-support  assembly  can  be  easily  obtained  by 
differentiating  the  velocity  in  equation  above  with  respect  to  time  which  yields 

x(t)  =-(B  w2  sin{wt)  +  Aw2  cos(wt))  (17) 

The  minus  sign  on  the  right  hand  side  of  the  equation  indicates  negative  acceleration  or 
deceleration  of  the  block  with  time  which  is  to  be  expected  due  to  the  restraining  action  of 
the  welded  supporting  plates  underneath  the  block. 

OPTIMIZATION  PROCEDURE 

In  order  to  predict  the  magnitude  of  peak  displacement  and  peak  shear  stresses  as  well 
as  strains  realized  by  the  shear  block  at  the  interface  between  the  block  and  the  beam,  it  is 
necessary  to  determine  the  specific  time  of  occurrence  of  the  peak  response.  Optimization 
of  the  peak  response  by  some  means  is  essential  to  arrive  at  an  optimum  occurrence  time. 
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A  standard  mathematical  approach  to  optimization  is  adopted  in  lieu  of  a  trial  and  error 
minimization  scheme.  In  order  to  maximize  the  displacement  the  derivative  of  the  displace¬ 
ment  with  respect  to  time  or  the  velocity  can  be  set  to  zero  such  that 

Bwcos(wtop)  -  [Awsin{wt0 p)  +  Fp/(iw2M2tp)]  =  0  (18) 

where  top  is  the  optimum  time  at  which  the  peak  displacement  response  occurs.  The 
above  equation  is  required  to  be  solved  for  the  unknown  optimum  time.  In  the  particular 
case  where  wtop  is  small  the  above  equation  simplifies  to  a  form  : 


Bw  -  Ato2top  +  Fp/{w2M2tp)  =  0 

(19) 

or,  top  =  (l/Aw)[(B  +  Fp/{M2tpw3)] 

(20) 

Peak  displacement  could  now  be  easily  computed  by  substituting  the  expression  for  the 
optimum  time  given  above  in  the  equation  for  displacement  time  relationship  obtained  earlier 
which  can  be  reduced  to  a  simpler  form 

x*  =  A  +  Bwtop  -I-  -Fp[l  -  top/tp]/M2w1  (21) 

where  xp  is  the  peak  displacement  of  the  block  assembly  at  time  Now  substituting  the 
value  of  the  optimum  occurrence  time  in  the  above  equation  one  can  arrive  at  an  algebraic 
expression  for  the  peak  displacement  of  the  block  in  the  form 

4  =  ((a!  +  b!)M)  +  (f,/(>»jm2)1[i- 

Fr/(AM,ty)\  (22) 


where  Fp,w  ,  M2  and  tp  are  previously  defined  known  quantities  with  specific  values  for  a 
particular  problem  and  A,  B  are  constants  evaluated  from  initial  and  boundary  conditions. 

The  displacement  is  guarenteed  to  be  a  global  maximum  provided  the  double  derivative  of 
the  displacement  with  respect  to  time  or  the  acceleration  at  the  optimum  time  of  occurrence 
is  negative  such  that 


-  {Bw1  sin{wtop)  -1-  Aw1cos(wtop))  <  0 

(23) 

or,  tan(totop)  >-{A/ B  ) 

(24) 

For  each  optimum  time,  t^,  the  above  inequality  must  be  checked  out  for  the  specific 
problem  in  order  to  ensure  that  the  peak  displacement  is  indeed  a  global  maximum.  Similarly 
the  optimum  response  time  at  which  the  velocity  of  the  shear  block  attains  a  peak  could 
be  determined  by  setting  the  right  hand  expression  of  the  acceleration  equation  equal  to 
zero  and  verifying  that  the  derivative  of  the  acceleration  with  respect  to  time  at  this  time  of 
occurrence  is  negative  which  ensures  that  the  peak  velocity  is  a  global  maximum. 
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RESULTS  AND  DISCUSSION 


Although  it  may  be  possible  to  solve  exactly  for  the  optimum  time  from  the  equations 
resulting  from  the  optimization  procedure  described  in  the  previous  section,  it  is  sufficient 
for  most  problems  to  adopt  a  trial  and  error  approach  where  various  suitable  values  of 
the  optimum  time  are  substituted  in  the  left  hand  side  expression  of  the  velocity  equation. 
The  difference  between  the  calculated  velocity  and  zero  which  is  the  right  hand  side  of  the 
equation  is  treated  as  an  error  which  is  minimized  by  adjusting  the  optimum  time  until  it 
nearly  vanishes. 

Once  the  peak  shear  displacement  is  obtained  as  outlined,  it  is  fairly  easy  to  calculate 
the  peak  shear  strain  as  a  ratio  of  the  peak  transverse  displacement  and  the  height  of  the 
support  plates.  The  peak  shear  stress  at  the  interface  between  the  block  and  the  beam  can 
be  easily  obtained  by  multiplying  the  shear  strain  with  the  shear  modulus  of  the  material 
for  the  block-support  assembly.  The  shear  stress  could  be  compared  with  the  ultimate  or 
yield  shear  strength  of  the  parent  material  in  order  to  determine  the  structural  integrity  of 
the  assembly.  A  factor  of  safety  can  be  worked  out  by  taking  a  ratio  of  the  ultimate  or  yield 
strength  of  the  material  for  the  support  plates  to  the  actual  shear  stress  developed  at  the 
interface.  If  the  factor  of  safety  is  less  than  or  equal  to  1.0,  structural  failure  in  shear  is 
indicated  at  the  interface  requiring  redesign  of  the  block-support  assembly.  However,  if  the 
factor  of  safety  is  greater  than  1.0  a  margin  of  safety  can  be  given  as  a  measure  of  structural 
integrity. 

Although  the  analysis  resorts  to  several  simplifying  assumptions  regarding  the  loading 
function  and  the  details  of  the  assembly,  it  gives  a  valuable  insight  into  the  dynamic  shear 
response  behavior  of  a  class  of  structures  subjected  to  side-on  impact  loading.  The  analysis 
could  be  extended  to  side-on  overpressure  loading  due  to  a  blast  by  modifying  the  forcing 
function  and  reformulating  the  equation  of  motion  resulting  in  a  somewhat  different  type 
of  solution  appropriate  for  explosive  loading.  The  procedure  outlined  above  is  a  quick  and 
inexpensive  method  of  solution  of  response  of  structures  dominated  by  shear  phenomena 
occurring  at  interfaces. 

ACKNOWLEDGEMENTS 

Valuable  assistance  of  Drs  Andrew  V.  Mark  and  Joseph  M.  Santiago  of  the  Terminal 
Ballistics  Division  during  the  course  of  this  investigation  is  gratefully  acknowledged. 


REFERENCES 

1.  J.T.  Gordon  Jr.  and  J.E.  Reaugh,  "Strain-Rate  Effects  on  Turbine  Missile  Casing  Im¬ 
pact”,  Computers  and  Structures,  Vol.  13,  pp.  311-318,  1981. 

2.  H.R.  Yoshimura  and  J.T.  Schauman,  "Preliminary  Results  of  Turbine  Missile  Casing 
Tests,”  EPRI  Research  Project  Report  No.  399,  EPRI,  Palo  Alto,  California,  1978. 

3.  A.D.  Gupta,  "Impact  of  an  Elastic  Perfectly-Plastic  Plate  on  a  Rigid  Target,”  Volume 
3,  Bk.  No.  G0431C,  Proceedings  of  the  1988  ASME  International  Computers  in  Engi¬ 
neering  Conference,  San  Francisco,  CA,  August  1-3,  1988. 


110 


4.  J.T.  Dehn,  "Models  of  Explosively  Driven  Metals,”  U.S.  Army  Ballistic  Research  Lab¬ 
oratory  Technical  Report  No.  BRL-TR-2626,  Aberdeen  Proving  Ground,  MD,  1984. 

5.  R.M.  Norman,  "Deformation  in  Flat  Plates  Exposed  to  HE  Mine  Blast,”  AMSAA-TM- 
74,  U.S.  Army  Material  Systems  Analysis  Agency,  APG,  MD,  1970. 

6.  N.E.  Hoskin,  J.W.  Allan,  W.A.  Bailey,  J.W.  Lethaby  and  I.  Skidmore,  "The  Motion  of 
Plates  and  Cylinders  Driven  at  Tangential  Incidence,”  Fourth  International  Symposium 
on  Detonation,  ONR  ACR-126,  p.14,  1965. 

7.  J.A.  Zukas,  T.  Nicholas,  H.F.  Swift,  L.B.  Greczuk  and  D.R.  Curran,  "Impact  Dynamics” 
pp.  150-165,  John  Wiley  and  Sons,  1982. 

8.  B.D.  Lambourn  and  J.E.  Hartley,  "The  Calculation  of  the  Hydrodynamic  Behavior  of 
Plane  One-Dimensional  Explosive/Metal  System,”  Fourth  International  Symposium  on 
Detonation,  ONR  ACR-126,  1965. 

9.  W.E.  Johnson,  "Code  Correlation  Study,”  Air  Force  Weapons  Laboratory  Report  No. 
AFWL-TR-70-144,  Kirtland  Air  Force  Base,  Albuquerque,  NM,  1971. 

10.  R.E.  Lottero  and  K.D.  Kimsey,  ”A  Comparison  of  Computed  versus  Experimental  Load¬ 
ing  and  Response  of  a  Flat  Plate  Subjected  to  Mine  Blast,”  U.S.  Army  Ballistic  Research 
Laboratory  Report  No.  ARBRL-MR-03249,  Aberdeen  Proving  Ground,  MD,  1978. 


Ill 


ON  THE  CONTINUUM  MECHANICS  OF 
THE  MOTION  OF  A  PHASE  INTERFACE1 

Morton  E.  Gurtln 
Department  of  Mathematics 
Carnegle-Mellon  University 
Pittsburgh,  PA  15213 

ABSTRACT.  A  recent  series  of  papers  [G.AG.GS]  began  an 
investigation  whose  goal  Is  a  thermomechanics  of  two-phase 
continua  based  on  Gibbs's  notion  of  a  sharp  phase-interface 
endowed  with  thermomechanical  structure.  In  EG]  a  new  balance 
law,  balance  of  capillary  forces,  was  Introduced  and  then  applied  In 
conjunction  with  suitable  statements  of  the  first  two  laws  of 
thermodynamics;  the  chief  results  are  thermodynamic  restrictions 
on  constitutive  equations,  exact  and  approximate  free-boundary 
conditions  at  the  interface,  and  a  helrarchy  of  free-boundary 
problems.  [AG]  applied  this  theory  to  perfect  conductors,  in  which 
the  underlying  equations  reduce  to  a  single  evolution  equation  for 
the  interface.  [G]  and  [AG]  were  limited  to  rigid  systems;  [GS] 
extends  the  theory  to  include  bodies  that  deform  as  they  solidify 
or  melt.  These  theories  involve  several  new  concepts,  examples 
being;  the  creation  of  new  material  points;  work  intrinsic  to  a 
moving  interface;  the  formulation  of  conservation  laws  for  a 
moving  Interface.  Here  I  shall  discuss  some  of  the  new  ideas 
Involved  In  [GS]. 

MECHANICS  AND  ENERGETICS  OF  DEFORMING,  ACCRETING 
CRYSTALS.  In  [GS],2  the  body,  ostensibly  a  crystal,  is  allowed: 

Supported  by  the  U.  S.  Army  Research  Office. 

2tGS]  was  motivated  by  studies  of  Leo  and  Sekerka  [LS],  Alexander  and  Johnson 
[AJ.JA],  and  Larche  and  Cahn  [LC],  which  derive  equilibrium  relations  for  the  crystal 
surface  as  Euler-Lagrange  equations  corresponding  to  a  stationary  global  Gibbs 
function.  Such  derivations  are  appropriate  to  statics  but  tend  to  obscure  the 
fundamental  nature  of  balance  lavs  as  baste  axioms  in  any  dynamical  framework 
which  includes  inertia  and  dissipation. 
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(i)  to  crystallize  through  the  addition  or  deletion  of  material 
points  at  the  crystal  surface,  a  process  termed  accretion; 

(ii)  to  deform. 

In  conjunction  with  these  kinematical  processes,  two  distinct 
force  systems  are  introduced: 

(i)  a  system  of  accretive  forces  which  acts  within  the  crystal 
lattice  to  drive  the  crystallization  process; 

(ii)  a  system  of  deformational  forces  to  be  identified  with  the 
more  or  less  standard  forces  that  act  in  response  to  the  local 
motion  of  material  points. 

Because  of  the  nonclassical  nature  of  accretive  forces,  it  is 
not  at  all  clear  that  there  should  be  an  accompanying  balance  law, 
let  alone  what  it  should  be  and  how  it  should  relate  to  the 
deformational  system.  For  that  reason  the  underlying  mechanical 
balance  laws  are  derived  from  the  requirement  that  the 
mechanical  production  -  the  raie  of  kinetic  energy  minus  the  rate 
of  working  -  be  independent  of  the  observer.  Here  it  is  necessary 
to  introduce  a  new  idea,  that  of  a  lattice  observer:  in  addition  to 
the  standard  observer  who  measures  the  gross  velocities  of  the 
continuum,  there  is  a  second  observer ,3  who  studies  the  lattice 
and  measures  the  velocity  of  the  accreting  crystal  surface.  This 
proceedure  leads,  not  only  to  the  "standard"  balance  laws  for  linear 
and  angular  momentum,  but  to  new  laws  expressing  balance  of 
(micro)forces  and  (micro)moments  within  the  crystal  lattice  at  the 
crystal  surface. 

One  of  the  chief  differences  between  theories  involving  phase 
transitions  and  the  more  classical  theories  of  continuum  mechanics 
is  the  creation  and  deletion  of  material  points  as  the  phase 
interface  moves  relative  to  the  underlying  material.  We  associate 
with  this  process  internal  forces  whose  working  provides  an 
outflow  of  "mechanical  energy"  associated  with  the  attachment  and 
release  of  atoms  as  they  are  exchanged  between  phases.  We  write 
an  energy  balance  relating  these  internal  forces,  the  forces 

3The  use  of  more  than  one  observer  might  be  useful  In  other  continuum  theories, 
such  as  theories  of  liquid  crystals,  of  structured  continue,  or  of  mixtures,  in  which 
"force'-balance  lavs  over  and  above  the  standard  lavs  arise. 
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described  previously,  and  the  bulk  energy  of  the  two  phases  at  the 
crystal  surface. 

COHERENT  CRYSTAL-CRYSTAL  INTERACTIONS.  To 
illustrate  the  results  of  the  general  theory,4  consider  an 
isothermal  crystal-crystal  interaction,5  in  which  the  environment 
consists  of  a  second  solid  phase  of  the  crystal  material,  and  in 
which  the  reference  lattices  can  be  chosen  to  match  exactly  at  the 
interface,  even  though  the  states  of  stress  and  deformation  will 
generally  differ  across  the  interface.  For  such  an  interface 
balance  of  linear  momentum  has  the  form 

div*S  +  (Sp-Sa)n  ■  pv(vtt  -  vp),  (LM) 

while  the  accretive  laws  for  force  and  energy  may  be  combined  to 
form  a  single  accretive  balance  lav 

«  (Spn).(Fpn)  -  (San)«(Fan)  ♦ 

ipv2{  iF^nl2  -  IFpnl2}  ♦  (AB) 

it  -  o’k  -  div*e  +  (Ft§) ■  L . 

Here  <x  and  J3  identify  the  two  phases;  S.  v,  W,  and  F 
(appropriately  labelled)  designate  the  bulk  Piola-Kirchhoff  stress, 
the  bulk  velocity,  the  bulk  free  energy,  and  the  bulk  deformation 
gradient;  p  is  the  common  referential  density  of  the  two  phases; 
o'.  8.  C  and  it  are  the  surface  tension,  the  interfacial  Piola- 
Kirchhoff  stress,  the  accretive  s*hear,  and  the  normal  attachment 
force;  n  is  the  outward  unit  normal  to  phase  oc;  v.  L,  k,  and 
dlv*  are  the  normal  velocity,  the  curvature  tensor,  twice  the  mean 
curvature,  and  the  surface  divergence  for  the  interface. 

The  balance  laws  (LM)  and  (AB)  are  general  relations, 
independent  of  the  particular  material  under  consideration.  [GS] 
gives  a  thermodynamic  argument  in  support  of  the  interfacial 

4[0S]  also  derives  equations  for  a  solid  crystal  in  a  liquid  melt. 

5Cf.  Larche  and  Cahn  [LC]. 


constitutive  equations 


o'  -  y~(F,n), 

§  -  d7\jj~(F,n),  (CE) 

6  =  -D^CF.n), 
it  -  j3(F,n)v, 

where  xjAF.n)  is  a  constitutive  function  for  the  interfacial  free 

energy,  F  is  the  tangential  deformation  gradient,  Dn  is  the 

derivative  with  respect  to  n  following  the  interface,  and 

J3(F,n)  >0  is  a  material  function. 
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ABSTRACT 

In  this  paper,  we  analyze  the  inverse  problem  in  which  residual  surface 
displacements  are  used  Co  evaluate  nonelascic  deformation  in  a  domain,  which 
is  called  the  damage  domain,  of  a  solid.  The  problem  is  taken  as  an  example 
to  elucidate  the  nonlinearity  of  a  class  of  inverse  problems. 

The  problem  can  be  formulated  as  a  system  of  multi -dimensional  Fredholm 
integral  equations  of  the  first  kind.  It  is  a  complicated  nonlinear  problem 
since  both  damage  domain  (which  appears  as  the  domain  of  integration  in  the 
integral  equation)  and  the  nonelastic  strains  are  unknown.  The  surface  data 
are  not  sufficient  to  determine  the  shape  of  the  damage  domain  and  the  exact 
distribution  of  the  nonelastic  strains.  However,,  these  data  can  be  used  to 
obtain  some  important  characteristic  quantities  associated  with  the  r.on- 
elastic  deformation  of  the  solid,  such  as  elastic  energy,  stresses  in 
certain  region  of  the  solid  or  the  fracture  toughness  enhancement  due  to 
localized  nonelascic  deformation. 

The  research  shows  an  interesting  example  of  conversions  between 
nonlinear  and  linear  problems.  By  introducing  the  concept  of  equivalent 
damage  domain,  the  general  nonlinear  problem  is  first  converted  into  a 
linear  one  which  is  more  tractable,  but  still  ill-posed.  A  variational 
problem  is  then  imposed.  This  leads  to  a  new  linear  problem  with  a 
parameter  determined  by  a  nonlinear  algebraic  function.  The  payoff  of  the 
second  conversion  is  the  well-poseness  (uniqueness  and  stability)  of  the  new 
problem.  This  new  problem  is  essentially  a  nonlinear  problem  again,  but  a 
much  easier  one  compared  with  the  original  nonlinear  problem.  A  numerical 
scheme  is  easily  constructed  due  to  the  rnonotonic  property  of  the  nonlinear 
algebraic  function. 


1.  Introduction 

In  recant:  years,  inverse  problems  are  becoming  increasingly  important 
in  many  scientific  fields.  Inverse  scattering  problems  deal  with  the 
determination  of  the  existence,  locations  and  sizes  of  defects  in  mechanical 
structures  by  measurements  of  scattered  ultrasonic  wave.  Increasing  numbers 
of  results,  especially  experimental  ones,  have  been  reported  (e.g.,  Ogura, 
1983).  In  the  inverse  problems  of  vibration,  natural  frequencies  are  used 
to  reconstruct  mass  distribution  of  the  structure  (e.g.,  Gladwell,  1986). 
Intensive  work  has  been  done  in  this  area,  particularly  for  in-line  discrete 
systems  and  one  dimensional  continuous  systems,  in  which  the  corresponding 
mathematical  problems  are  relatively  simple  and  analytical  results  can  be 
derived.  Backus  and  Gilbert  (1967,  1970,  1980)  have  studied  the  problem  of 
determining  the  density  distribution  in  the  earth  as  well  as  wave  velocities 
from  observed  travel  time  data,  together  with  the  known  mass  and  moment  of 
inertia  of  the  earth,  and  the  frequencies  of  certain  normal  modes  of 
vibration. 

This  is  a  paper  dealing  with  inverse  problems  in  solid  mechanics.  Our 
objective  is  to  characterize  nonelastic  deformation  in  bulk  after  a  series 
of  loadings  by  using  only  the  residual  surface  displacements  instead  of  the 
entire  loading  history.  The  residual  surface  displacements  are  relative  and 
are  defined  as  the  difference  of  the  initial  and  final  values  of  the 
displacements . 


Fig.  1.  -A  traction  free  body  D  with  a  sub-region  where 
residual  nonelastic  strains  are  accumulated. 


Suppose  chat  nonelastic  strains  £?^  are  caused  in  a  subdomain  0  (damage 
domain)  of  a  given  body  D  after  a  series  of  unknown  loadings  (Fig.  1).  The 
integral  equation  relating  to  the  residual  displacements  u^  is  written 
as  (Gao  and  Mura,  1989) . 


I 

Q 

-I 

3D 


(x  -  x')  e*  (x>  dx 


ijki  km, 2  -  7  ij  11 - 


Cijki  Gkm,2  (S  *  ui  (2>  nj  ds  +  2  um 


(1) 


x'  £  3D 

where  3D  is  the  boundary  of  D;  n^  is  the  outer  normal  of  3D;  is  the 

elastic  modulus  tensor  of  the  material  and  G^Cx  -  x' )  is  the  Green's 

function  for  an  infinite  elastic  medium,  i.e.,  G^^x  -  x' )  is  the 

displacement  at  point  x  in  the  x^  direction  due  to  a  unit  force  at  point  x’ 

in  the  x'  direction.  G.  ,(x  -  x')  represents  (3/3x,)G,  (x  -  x' ) . 
m  Xm,  j:  —  —  l  xm  —  — 

Equation  (1)  can  be  obtained  by  using  the  Betti's  reciprocal  theorem. 
We  refer  readers  to  Gao  and  Mura  (1989)  for  detailed  derivation. 

2 .  Uniqueness 

Our  objective  is  to  determine  nonelastic  strains  £?.  and  the  domain  0 

ij 

(a  nonlinear  problem).  However,  neither  of  these  two  quantities  can  be 
obtained  from  equation  (1). 


Let  G  be  a  domain  inside  the  body  D.  When  a  distribution  of 
nonelastic  strains  is  compatible  in  0  ,  the  remainder  D  -  £1  is  not 
disturbed.  Hence,  the  displacements  and  stresses  in  D  -  0  vanish.  This 
implies  that  the  homogeneous  equation  of  (1)  has  nonzero  solution  £?^(x)  for 

arbitrarily  chosen  domain  G  .  It  is  then  clear  that  £\\  and  G  cannot  be 
determined  uniquely  from  equation  (1). 
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Ci  mmC 


Fig.  2.  -  A  two-dimensional  body  with  a  line  defect 
(or  dislocation  loop) . 


Consider  a  two-dimensional  case.  Equation  (1)  has  the  unique  solution 
when  n  is  a  contour  C  (Fig.  2).  One  interpretation  of  this  case  is  that  C 
is  a  dislocation  loop.  The  Somigliano's  dislocation  density  b  yields 

nonelastic  strains,  defined  on  C, 


CP  -  1 


£ij  "  2  (bi  nj  +  bj  V 


Let  the  equation  of  the  closed  contour  C  be  r  -  r (3).  Equation  (1)  is 
then  changed  to 


rllT 

C.  ,  G.  .  (X  -  x' ) 
Jn  ijki  km, 2  v- 


r(3 )sin3 


r (3)  d& 


r(3  )c os3 


-I 

3D 


'ijki 


Gkm,i(- 


*  x') 


ui(x) 


n. 


ds  + 


u  (x' ) 

m  — 


(2) 


x'  £  3D. 


as  well  as  function  r (3)  (shape  of  contour  C)  are  determined 

uniquely  from  the  surface  displacements.  The  reason  for  the  uniqueness  is 
as  follows. 
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We  have  shown  (Gao  and  Mura,  1989)  that  the  displacement  field,  of  the 
points  not  belonging  to  0,  are  uniquely  determined  from  residual  surface 
displacements.  Therefore,  if  rigid  body  motion  is  properly  excluded,  the 


displacements  inside  and  outside  contour  C, 


denoted  by  uf^  and  u^ 

}  i  i 


respectively,  are  determined  by  the  surface  displacements.  is  then 

deduced  from  the  mismatch  of  uj^  and  u|^  on  the  contour  C.  Hence, 
equation  (2)  has  the  unique  solution  for  6?.(0)  and  r (9). 


90 


270 


Fig.  3.  -  The  intial  and  final  configurations  of  iterations 
compared  to  the  actual  shape  of  contour  C.  The 

actual  values  of  nonelastic  strains  are  «?.  -  1 

J-J 

while  the  computed  values  are  -  0.97, 

-  0.97,  «P2  -  0.9. 
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An  example  is  shown  in  Fig.  3,  in  which  we  expand  r  -  r (0)  by  Fourier 
series 


r  (0) 


CO 


1 


(a  sin  nS 
n 


+  bn  cosnS) 


*>  bg  +  a^  sin0  +  b^  cos0  . 


(3) 


If  we  assume  e^j  are  constanCs,  equation  (2)  becomes  a  nonlinear  equation 


for  « 


1 1  > 


12  >  e  2  2  ' 


'0  » 


a:  and  bz 


Choosing  an  initial  configuration  of 


r(0),  the  nonlinear  equation  is  solved  by  an  optimization  algrithom 
(Subroutine  ZXMIN  in  IMSL  Library)  which  minimizes  the  difference  between 
the  right  and  left  hand  sides  of  (2).  The  initial  and  final  configurations 
of  r  -  r(0)  are  compared  to  the  actual  shape  of  r(0)  (Fig.  3).  The  actual 


values  of  nonelastic  strains  are  -  1  while  the  computed  ones  are  - 
0.97,  «22  -  0-97  and  e^2  “0.9,  respectively. 


X' 


-  A  body  with  a  point  defect  0.  x  -  x'  -  xq  - 
for  x  «  (1  and  x'  t  3D. 


Fig.  4. 


x' 


Another  interesting  case  is  as  follows.  When  domain  Q  is  small  and  far 
away  from  surface  3D,  Cl  can  be  treated  as  a  point  defect  (Fig.  4).  Note  x  - 

x'  =  xq  •  x'  for  x'  e  3D,  x  e  Cl  and  a  fixed  point  xq  inside  Cl.  Equation  (1) 

can  be  simplified  as 

cijU  Vi  <Xo  •  S’>  J  ‘ij  <2>  ^ 

Cl 


I  < 

3D 


Cij  ki 


Gka, i  * 


x') 


u. 

L 


(X) 


n. 

J 


ds 


1 

2 


u 

m  - 


(x') 


(M 


x'  e  3D. 


The  location  of  the  defect  xq>  as  well  as  the  quantity  J  (x)  dx  can  be 

obtained  by  employing  a  proper  algorithm  for  the  nonlinear  problem  (4) . 


3.  From  the  Nonlinear  Problem  to  a  Linear  Problem 

As  we  have  mentioned,  in  general,  the  nonlinear  problem  (1)  to 

determine  Cl  and  cannot  be  solved  uniquely.  Even  for  certain  specific 

problems  (e.g.,  equations  (2)  and  (4)),  where  the  uniqueness  is  guaranteed, 
the  construction  of  a  proper  algorithm  is  still  a  difficult  task.  On  the 
other  hand,  the  degree  of  difficulty  will  be  greatly  reduced  if  the 
nonlinear  problem  is  converted  into  a  linear  one. 

Problem  (1)  becomes  a  linear  problem  when  domain  Cl  is  specified.  The 
question  is  how  to  specify  Cl  since  we  really  do  not  know  its  shape  and 
location. 


Choose  a  domain  Cl  (equivalent  damage  domain)  such  that  Cl  is  contained 
inside  Cl  .  Equation  (1)  is  changed  to 

I  ,  cljki  Vi  (S  ‘  i’>  'ij  <2>  dS 

Cl 

■j  Cljki  Vi  <5  ‘  -)  Ui  (5'  "j  *  2  Um  (5'>  (5) 

3D 

x'  <  3D 
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which  is  a  linear  problem  to  determine  .  Now  we  discuss  the  relationship 
between  the  solutions  of  nonlinear  problem  (1)  and  linear  problem  (5). 


Conclusion  1.  The  stress  field  caused  by  both  solutions  are  identical 
* 

in  the  region  outside  fl  . 


Conclusion  2.  The  minimum  elastic  energy  (or  any  other  quadratic 
function  of  e?^)  of  all  the  satisfying  (5)  is  a  lower  bound  of  that  of 

actual  nonelastic  strains  satisfying  (1). 

The  above  conclusions  are  based  on  the  fact  that  if  surface 
displacements  and  traction  forces  are  zero  on  a  part  of  the  boundary  of  an 
elastic  body,  the  displacements  and  stresses  are  identically  zero  in  the 
whole  elastic  body.  The  details  of  the  discussion  can  be  found  in  Gao  and 
Mura  (1989).  The  same  idea  applies  to  the  problem  of  calculating  the 
shielding  effects  due  to  an  unknown  distribution  of  micro -defects  in  an 
unknown  domain  Q  by  measuring  the  crack  opening  displacements  (Gao  and  Mura, 
1990;  also  see  Fig.  5). 


Fig.  5.  -  An  infinite  medium  with  a  crack.  The  shielding 

effects  of  the  mirco-defects  can  be  calculated  from 
measurements  of  crack  opening  displacements. 
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4.  From  the  Linear  Problem  to  a  New  Nonlinear  Problem 

By  specifying  domain  Q  as  Q  ,  we  have  changed  the  nonlinear  problem  (1) 
to  a  linear  equation  (5).  The  solution  of  the  linear  problem  (5)  perserves 
important  characteristics  of  the  actual  nonelastic  strains.  However, 
equation  (5)  is  still  an  ill-posed  problem  (nonunique  and  unstable) ,  which 
cannot  be  solved  directly. 


Let's  write  equation  (5)  as 
U(x')  -  |  K(x,x* )V(x) dx 


for  x'  on  3D 


(6) 


a 


where  U(x')  is  known  since  u.(x)  is  given  on  3D.  K(x,x')  -  C...  „  G,  , 

-  -  l  -  6  -  -  -  7  ijki  km,i 

(x-x')  and  V(x)  is  an  unknown  vector  whose  components  are  e ?j . 


Now  consider  a  variational  problem 

Min  (I  V  (x)  M2  (7) 

subjected  to  | |  |  K  (x,x')  V  (x)  dx  -  U  (x' )  ||2  -  e 
Q 

where  «  is  a  small  number  chosen  from  the  accuracy  of  measurement  and 

I |  V  (x)  | |2  -  f  VT(x)  V  (x)  dx  (8) 

n 

T 

V  is  the  transpose  vector  of  V. 


The  use  of  a  Lagrange  multiplier  A  transforms  (7)  to 

Min  <{||  V  (x)  |  |2  +  A(|  |  |  K(x,x')V(x)dx  -  U(x')  j  |2  -  «)}-. 


(9) 


n 


The  Euler  equation  of  (9)  becomes 

j  5  (*•£)  Y  (*)  dx  +  a  V  (y)  -  U*  (^) ,  for  £  «  n* 


(10) 


n 
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where 


a  -  1/A 

K*(x,jr)  -  f  KT(Z,x')K(x,x')dx'  (11) 

3D 

U*(jr)  -  |  KT(£,x')U(x')dx'  . 

3D 

The  integral  equation  (10)  is  solved  for  V(x)  with  parameter  a.  The  value 
of  a  is  determined  from 

f (a)  -  ||  |  K(x,x' )V(x)dx  -  U(x')  j|2  -  «  -  0.  (12) 

a* 

Equation  (10)  is  a  well-posed  Fredholm  integral  equation  of  the  second 
kind  with  a  self-adjoint  kernel.  For  any  chosen  parameter  a,  equation  (10) 
can  be  solved  by  employing  conventional  techniques  such  as  finite  element 
method.  However,  che  parameter  a  must  satisfy  the  nonlinear  algebraic 
equation  (12).  Therefore,  the  new  problem  is  essentially  a  nonlinear 
problem  again,  but  a  much  easier  one  compared  with  the  original  nonlinear 
problem  (1) . 

The  nonlinear  function  f(a)  is  an  increasing  function  of  a  and  has  only 
one  root  (Gao  and  Mura,  1989).  Therefore,  the  root  can  be  solved  by  the 
bisection  algorithm.  The  algorithm  converges  rather  fast  since  in  each 
iteration,  the  interval  containing  the  root  of  f(a)  is  reduced  by  half. 


Fig.  6.  -  Even  though  fl  (where  nonelastic  strains  are  distributed) 

is  unknown,  it  is  always  possible  to  cover  0  with  a  chosen 

domain  Cl  .  The  original  nonlinear  problem  is,  therfore, 
changed  into  a  linear  problem. 
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MX) 


Lee  us  consider  Che  example  shown  in  Fig.  6.  Two  dimensional  half 
space  is  given  by  x2  >  0  and  the  nonelastic  strains  distributed  inside  Q  are 


-  2  «?*  -  28  (xx  -  0.6)  (x2  -  1.0) 
2  -  -  -  20  (x2  -  1.7)  cos  2xx. 


Table  1 


Q*  |  I V  |  |  7  Element  Element 

Number  Length 


0<x , <0  5; 

L[:<.j  : 

0.1 

0.5^x2<1. 

3.34 

1.0 

25 

L[x21  : 

0.1 

Oix^O.54; 

L[xx]  : 

0.09 

0.485x2il.02 

8.10 

1.17 

36 

L[x2]  : 

0.09 

Oix  ^0.616; 

LfxJ  : 

0.088 

0.48i£x2Sl.096 

7.98 

1.52 

49 

L[x2  ]  : 

0.088 

0<xx<0. 744; 

L[xx]  : 

0.093 

0.47<5x2Sl.27 

7.44 

2.38 

64 

L[x2  ]  : 

0.1 

n  -  (OsxjSO.5; 

0. 5sx2sl) . 

The 

exact  value 

of  ||V|| 

is  8.: 

★ 

7  -  area  of  0  /area  of  0 


(13) 
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Table  1  lists  some  characteristic  quantities  for  the  calculations  of 

*j|p 

different  choices  of  domain  .  For  instance,  from  the  second  row  of  Table 

1  we  know  0  is  taken  as  0  £  ^  s  0.54,  0.48  ^  x7  £  1.02  to  cover  fl.  36 

elements  are  used  and  the  element  lengths  in  both  directions  are  0.09.  The 
computed  value  of  | |V(x) | (  is  8.1  which  is  a  lower  bound  of  Che  actual  value 

8.33. 

2 

It  should  be  mentioned  chat  if  we  replace  ||V(x)||  In  equation 

(7)  by  the  elastic  strain  energy  and  change  (10)  properly,  we  can  obtain  a 
lower  bound  of  the  elastic  strain  energy. 


Fig.  7.  -  0  is  the  circular  domain  in  a  half- space 

* 

x2  £  0.  ft  covers  ft. 


Another  example  is  shown  in  Fig.  7.  The  nonelastic  strains 


€ 


P 
1  2 


1 

2  ’ 


(14) 
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occur  in  domain  Q  (r  <  0.2).  Domain  Cl  is  chosen  to  include  Cl  inside.  3y 

n  ★ 

solving  equations  (10)  and  (12),  we  obtain  a  distribution  of  in  Q  , 
which  is  different  from  that  in  (14).  However,  as  indicated  by  Fig.  8,  in 
the  region  outside  ft*  the  stress  field  induced  by  the  computed  in  Cl*  i 
the  same  as  the  one  by  (14) . 


u 


A 


Fig.  8. 


-  The  residual  stresses  a  _  and  a  .  at  r/a  -  ] 

rr  rcr 

caused  by  (14)  are  identical  to  those  by  the 

n  ★ 

computed  in  Q  .  The  solid  lines  are  the 
computed  results  and  the  dsahed  lines  are 
induced  by  .  in  (14). 


Acknowledgment 


This  research  was  supported  under  U.  S.  Army  Reseach  Office 
Contract  No.  DAALQ3-88-C-0027  through  a  subcontract  with  Rockwell 
International  Science  Center. 


References 

Backus,  G.  E.  and  Gilbert,  J.  F.  (1967).  Numerical  applications  of  a 
formalism  for  geophysical  inverse  problems,  Geophy.  J.  R.  Astr.  Soc.  13. 

Backus,  G.  E.  and  Gilbert,  J.  F.  (1970).  Uniqueness  in  the  inversion  of 
inaccurate  gross  earth  data,  Phil.  Trans,  Roy.  Soc.  266. 

3ackus ,  G.  E.  and  Gilbert,  J.  F.  (1986).  The  resolving  power  of  gross  earth 
data,  Geophy.  J.  R.  Astr.  Soc.  16. 

Gao,  Z.  and  Mura,  T.  (1989).  On  the  inversion  of  residual  stresses  from 
surface  displacements,  J.  Appl.  Mech,  Vol.  56,  No.  3. 

Gao,  Z.  and  Mura,  T.  (1990).  IUTAM  Symposium  on  inelastic  deformation  of 
composite  materials  (submitted) . 

Gladwell,  G.  M.  L.  (1986).  Inverse  problems  in  vibration,  Martinus  Nijhoff, 
Dordrecht. 

Ogura,  Y.  (1983).  Height  determination  studies  for  planar  defects  by  means 
of  ultrasonic  testing.  The  Non  Destructive  Testing  Journal,  Japan.  Vol.  1, 
No.  1. 


130 


QUADRATIC  DYNAMICAL  SYSTEMS  DESCRIBING 
SHEAR  FLOW  OF  NON-NEWTONIAN  FLUIDS  * 

D.  S.  Malkus1,  J.  A.  Nohel2,  and  B.  J.  Plohr3 


Center  for  the  Mathematical  Sciences 
University  of  Wisconsin-Madison 
Madison,  WI  53705 


Abstract 


Phase-plane  techniques  are  used  to  analyze  a  quadratic  system  of  ordinary  dif¬ 
ferential  equations  that  approximates  a  single  relaxation-time  system  of  partial 
differential  equations  used  to  model  transient  behavior  of  highly  elastic  non- 
Newtonian  liquids  in  shear  flow  through  slit  dies.  The  latter  one-dimensional 
model  is  derived  from  three-dimensional  balance  laws  coupled  with  differential 
constitutive  relations  well-known  by  rheologists.  The  resulting  initial-boundary- 
value  problem  is  globally  well-posed  and  possesses  the  key  feature:  the  steady 
shear  stress  is  a  non-monotone  function  of  the  strain  rate.  Results  of  the  global 
analysis  of  the  quadratic  system  of  ode’s  lead  to  the  same  qualitative  features  as 
those  obtained  recently  by  numerical  simulation  of  the  governing  pde’s  for  realis¬ 
tic  data  for  polymer  melts  used  in  rheological  experiments.  The  analytical  results 
provide  an  explanation  of  the  experimentally  observed  phenomenon  called  spurt; 
they  also  predict  new  phenomena  discovered  in  the  numerical  simulation;  these 
phenomena  should  also  be  observable  in  experiments. 
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1.  Introduction 

The  purpose  of  this  paper  is  to  analyze  novel  phenomena  in  dynamic  shearing  flows 
of  non-Newtonian  fluids  that  are  important  in  polymer  processing  [17].  One  striking  phe¬ 
nomenon,  called  “spurt,”  was  apparently  first  observed  by  Vinogradov  et  al.  [19]  in  ex¬ 
periments  concerning  quasi  static  flow  of  mono  dispersive  polyisoprenes  through  capillaries 
or  equivalently  through  slit  dies.  They  found  that  the  volumetric  flow  rate  increased  dra¬ 
matically  at  a  critical  stress  that  was  independent  of  molecular  weigh*,  until  recently, 
spurt  has  been  associated  with  the  failure  of  the  flowing  polymer  to  adhere  to  the  wall  [5]. 
The  focus  of  our  current  research  is  to  offer  an  alternate  explanation  of  spurt  and  related 
phenomena. 

Understanding  these  phenomena  has  proved  to  be  of  significant  physical,  mathemati¬ 
cal,  and  computational  interest.  In  our  recent  work  [12],  we  found  that  satisfactory  expla¬ 
nation  and  modeling  of  the  spurt  phenomenon  requires  studying  the  full  dynamics  of  the 
equations  of  motion  and  constitutive  equations.  The  common  and  key  feature  of  constitu¬ 
tive  models  that  exhibit  spurt  and  related  phenomena  is  a  non-monotonic  relation  between 
the  steady  shear  stress  and  strain  rate.  This  allows  jumps  in  the  steady  strain  rate  to  form 
when  the  driving  pressure  gradient  exceeds  a  critical  value;  such  jumps  correspond  to  the 
sudden  increase  in  volumetric  flow  rate  observed  in  the  experiments  of  Vinogradov  et  al. 
The  governing  systems  used  to  model  such  one-dimensional  flows  are  analyzed  in  [12] 
by  numerical  techniques  and  simulation,  and  in  the  present  work  by  analytical  methods. 
The  systems  derive  from  fully  three-dimensional  differential  constitutive  relations  with  m- 
relaxation  times  (based  on  work  of  Johnson  and  Segalman  [8]  and  Oldroyd  [16]).  They 
are  evolutionary,  globally  well  posed  in  a  sense  described  below,  and  they  possess  discon¬ 
tinuous  steady  states  of  the  type  mentioned  above  that  lead  to  an  explanation  of  spurt. 
The  governing  systems  for  shear  flows  through  slit-dies  are  formulated  from  balance  laws 
in  Sec.  2. 

Specifically,  we  model  these  flows  by  decomposing  the  total  shear  stress  into  a  polymer 
contribution,  evolving  in  accordance  with  a  differential  constitutive  relation  with  a  single 
relaxation  time  and  a  Newtonian  viscosity  contribution  (see  system  (J SO)  in  Sec.  2.).  The 
flows  can  also  be  modelled  by  a  system  based  on  a  differential  constitutive  law  with  two 
widely  spaced  relaxation  times  (see  system  (JSO 2)  in  [13].)  but  no  Newtonian  viscosity 
contribution.  Numerical  simulation  [9,  12]  of  transient  flows  at  high  Weissenberg  (Debo¬ 
rah)  number  and  very  low  Reynolds  number  using  the  model  (JSO)  exhibited  spurt,  shape 
memory,  and  hysteresis;  furthermore,  it  predicted  other  effects,  such  as  latency,  normal 
stress  oscillations,  and  molecular  weight  dependence  of  hysteresis,  that  should  be  analysed 
further  and  tested  in  rheological  experiment. 

In  earlier  work.  Hunter  and  Slemrod  [7]  used  techniques  of  conservation  laws  to  study 
the  qualitative  behavior  of  discontinuous  steady  states  in  a  simple  one-dimensional  vis¬ 
coelastic  model  of  rate  type  with  viscous  damping.  They  predicted  shape  memory  and 
hysteresis  effects  related  to  spurt.  A  salient  feature  of  their  model  is  linear  instability  and 
loss  of  evolutionarity  in  a  certain  region  of  state  space. 

The  objective  of  the  present  paper  is  to  develop  analytical  techniques,  the  results  of 
which  verify  these  rather  dramatic  implications  of  numerical  simulation.  Based  on  scaling 
introduced  in  [12],  appropriate  for  the  highly  elastic  and  very  viscous  polyisoprenes  used  in 
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the  spurt-experiment,  we  axe  led  to  study  the  following  pair  of  quadratic  autonomous  ordi¬ 
nary  differential  equations  that  approximates  the  governing  system  ( JSO)  in  the  relevant 
range  of  physical  parameters  for  each  fixed  position  in  the  channel: 


Here  the  dot  denotes  the  derivative  d/dt ,  T  is  a  parameter  that  depends  on  the  driving 
pressure  gradient  as  well  as  position  x  in  the  channel,  and  e  >  0  is  a  ratio  of  viscosities. 
System  (1.1)  is  obtained  by  setting  a  =  0  in  the  momentum  equation  in  system  ( JSO);  this 
approximation  is  reasonable  because  a  is  at  least  several  orders  of  magnitude  smaller  than 
£.  We  show  that  steady  states  of  system  ( JSO ),  some  of  which  are  discontinuous  for  non¬ 
monotone  constitutive  relations,  correspond  to  to  critical  points  of  the  quadratic  system. 
We  deduce  the  local  characters  of  the  critical  points,  and  we  prove  that  system  (1.1)  has 
no  periodic  orbits  or  closed  separatrix  cycles.  Moreover,  this  system  is  endowed  with 
a  natural  Lyapunov-like  function  with  the  aid  of  which  we  axe  able  to  determine  the 
global  dynamics  of  the  approximating  quadratic  system  completely  and  thus  identify  its 
globally  asymptotically  stable  crical  points  (i.e.  steady  states)  for  each  position  x.  This 
analysis  is  carried  out  in  Sec.  3  When  a,  the  ratio  of  Reynolds  to  Deborah  numbers,  is 
strictly  positive,  the  stability  of  discontinuous  steady  states  of  system  (JSO)  remains  to 
be  settled.  Recently,  Nohel,  Pego  and  Tzavaras  [15]  established  such  a  result  for  simple 
model  in  which  the  polymer  contribution  to  the  shear  stress  satisfies  a  single  differential 
constitutive  relation;  for  a  particular  choice,  their  model  and  system  (JSO)  with  a  >  0 
have  the  same  behavior  in  steady  shear.  Their  asymptotic  stability  result,  combined  with 
numerical  experiments  and  research  in  progress,  suggest  that  the  same  result  holds  for  the 
full  system  (JSO),  at  least  when  a  is  sufficiently  small. 

In  Sec.  4., the  analysis  of  Sec.  3.  is  applied  to  each  point  x  in  the  channel,  allowing 
us  to  explain  spurt,  shape  memory,  hysteresis,  and  other  effects  originally  observed  in  the 
numerical  simulations  in  terms  of  a  continuum  of  phase  portraits.  We  discuss  asymptotic 
expansions  of  solutions  of  systems  (JSO)  and  (JSO  2)  of  Ref.  [13]  in  powers  of  e  that  enable 
us  to  explain  latency  (a  pseudo-steady  state  that  precedes  spurt).  The  asymptotic  analysis 
also  permits  a  more  quantitative  comparison  of  the  dynamics  of  the  two  models  when  e  is 
sufficiently  small.  In  Sec.  5.,  we  discuss  physical  implications  of  the  analysis,  particularly 
those  that  suggest  new  experiments.  In  Sec.  6.,  we  draw  certain  conclusions.  Although  the 
analysis  in  this  paper  applies  only  to  the  special  constitutive  models  we  have  studied,  we 
expect  that  the  qualitative  features  of  our  results  appear  in  a  broad  class  of  non-Newtonian 
fluids.  Indeed,  numerical  simulation  by  Kolkka  and  Ierley  [10]  using  another  model  with  a 
single  relaxation  time  and  Newtonian  viscosity  exhibits  very  similar  character. 
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2.  A  Johnson-Segalman-Oldroyd  Model  for  Shear  Flow 

The  motion  of  a  fluid  under  incompressible  and  isothermal  conditions  is  governed  by 
the  balance  of  mass  and  linear  momentum.  The  response  characteristics  of  the  fluid  are 
embodied  in  the  constitutive  relation  for  the  stress.  For  viscoelastic  fluids  with  fading 
memory,  these  relations  specify  the  stress  as  a  functional  of  the  deformation  history  of  the 
fluid.  Many  sophisticated  constitutive  models  have  been  devised;  see  Ref.  [2]  for  a  survey. 
Of  particular  interest  is  a  class  of  differential  models  with  m-relaxation  times,  derived  in  a 
three-dimensional  setting  in  Refs.  [12]  and  [13];  these  models  can  be  regarded  as  a  special 
cases  of  the  Johnson-Segalman  model  [8]when  the  memory  function  is  a  linear  combina¬ 
tion  of  m-decaying  exponentials  with  positive  coefficients  or  of  the  Oldroyd  differential 
constitutive  equation  [16]. 

Essential  properties  of  constitutive  relations  are  exhibited  in  simple  planar  Poiseuille 
shear  flow.  We  study  shear  flow  of  a  non-Newtonian  fluid  between  parallel  plates,  located 
at  x  ~  ±h/2,  with  the  flow  aligned  along  the  y-axis,  symmetric  about  the  center  line,  and 
driven  by  a  constant  pressure  gradient  /.  We  restrict  attention  to  the  simplest  model  of  a 
single  relaxation-time  differential  model  that  possesses  steady  state  solutions  exhibiting  a 
non-monotone  relation  between  the  total  steady  shear  stress  and  strain  rate,  and  thereby 
reproduces  spurt  and  related  phenomena  discussed  below.  The  total  shear  stress  T  is 
decomposed  into  a  polymer  contribution  and  a  Newtonian  viscosity  contribution.  When 
restricted  to  one  space  dimension  the  initial-boundary  value  problem,  in  non-dimensional 
units  with  distance  scaled  by  h,  governing  the  flow  can  be  written  in  the  form  (see  Refs.  [9, 
12]): 

ave  -ax-  svxx  +  /  , 

<7(  -  (Z  4-  l)vx  =  —<r  ,  ( JSO ) 

Z<  4-  <jvx  =  — Z 

on  the  interval  [—1/2,0],  with  boundary  conditions 

?>(— l/2,t)  =  0  and  yr(0,f)  =  0  (BC) 

and  initial  conditions 

u(x,0)  =  u0(z)  ,  <r(x,0)  =  <t0(x)  ,  and  Z(x,  0)  =  Zo(x)  ,  on  —  1/2  <  x  <  0;  ( IC ) 

symmetry  of  the  flow  and  compatibility  with  the  boundary  conditions  requires  that 
v0(- 1/2)  =  0,  u(,(0)  =  0  and  cro(0)  =  0. 

The  evolution  of  cr,  the  polymer  contribution  to  the  shear  stress,  and  of  Z,  a  quantity 
proportional  to  the  normal  stress  difference,  are  governed  by  the  second  and  third  equations 
in  system  {JSO).  As  a  result  of  scaling  motivated  by  numerical  simulation  and  introduced 
in  Ref.  [12],  there  are  only  three  essential  parameters:  a  is  a  ratio  of  Reynolds  number  to 
Deborah  number,  £  is  a  ratio  of  viscosities,  and  /  is  the  constant  pressure  gradient. 

When  £  =  0,  and  Z  +  1  >  0,  system  (JSO)  is  hyperbolic,  with  characteristics  speeds 
±[(Z  4-  ll/aj1/2  and  0.  Moreover,  for  smooth  intial  data  in  the  hyperbolic  region  and 
compatible  with  the  boundary  conditions,  techniques  in  [18]  can  be  used  to  establish 
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global  well-posedness  (in  terms  of  classical  solutions)  if  the  data  are  small,  and  finite¬ 
time  blow-up  of  classical  solutions  if  the  data  axe  large.  If  e  >  0,  system  ( J  SO )  for  any 
smooth  or  piece-wise  smooth  data;  indeed,  general  theory  developed  in  [15]  (see  Sec.  3 
and  particularly  Appendix  A)  yields  global  existence  of  classical  solutions  for  smooth 
initial  data  of  arbitrary  size,  and  also  existence  of  almost  classical,  strong  solutions  with 
discontinuities  in  the  initial  velocity  gradient  and  in  stress  components;  the  latter  result 
allows  one  to  prescribe  discontinuous  initial  data  of  the  same  type  as  the  discontinuous 
steady  states  studied  in  this  paper. 

The  steady-state  solutions  of  system  {J SO)  play  an  important  role  in  our  discussion. 
Such  a  solution,  denoted  by  w,  a,  and  Z,  can  be  described  as  follows.  The  stress  components 
<7  and  Z  axe  related  to  the  strain  rate  vx  through  the  relations 

Z  +  l  =  r-^=7-  (2.1) 


tt2 


1+  U* 


1  +vl 


Therefore,  the  steady  total  shear  stress  T  :=  a  +  evx  is  given  by  T  =  w(vx ),  where 

w(s)  :=  +  £s  ■  (2-2) 

The  properties  of  u;,  the  steady-state  relation  between  shear  stress  and  shear  strain 
rate,  are  crucial  to  the  behavior  of  the  flow.  By  symmetry,  it  suffices  to  consider  s  >  0. 
For  all  c  >  0,  the  function  w  has  inflection  points  at  s  =  0  and  s  =  \/Z.  When  e  >  1/8, 
the  function  w  is  strictly  increasing,  but  when  £  <  1/8,  the  function  w  is  not  monotone. 
Lack  of  monotonicity  is  the  fundamental  cause  of  the  non-Newtonian  behavior  studied  in 
this  paper;  hereafter  we  assume  that  e  <  1/8. 

The  graph  of  w  is  shown  in  Fig.  1.  Specifically,^  has  a  maximum^ at  s  —  sm 
and  a  minimum  at  s  =  sm,  where  it  takes  the  values  Tm  ■—  ^m)  and  Tm  :=  w(sm ) 
respectively.  As  e  — *  1/8,  the  two  critical  points  coalesce  at  s  =  y/3. 

The  momentum  equation,  together  with  the  boundary  condition  at  the  centerline, 
implies  that  the  steady  total  shear  stress  satisfies  T  =  —fx  for  every  x  E  [—^,0].  Therefore, 
the  steady  velocity  gradient  can  be  determined  as  a  function  of  x  by  solving 

w(vx)  =  -fx  .  (2.3) 


Equivalently,  a  steady  state  solution  vz  satisfies  the  cubic  equation  P( vx)  =  0,  where 

P(s)  :=  £  s3  —T s2  +  (1  +  e)s  —  T  .  (2.4) 


The  steady  velocity  profile  in  Fig.  2  is  obtained  by  integrating  vx  and  using  the  boundary 
condition  at  the  wall.  However,  because  the  function  w  is  not  monotone,  there  might 
be  up  to  three  distinct  values  of  vx  that  satisfy  Eq.  (2.3)  for  any  particular  x  on  the 
interval  [—1/2,0].  Consequently,  vx  can  suffer  jump  discontinuities,  resulting  in  kinks  in 
the  velocity  profile  (as  at  the  point  x,  in  Fig.  2).  Indeed,  a  steady  solution  must  contain 
such  a  jump  if  the  total  stress  Twaii  =  // 2  at  the  wall  exceeds  the  total  stress  T \j  at  the 
local  maximum  M  in  Fig.  1. 

Finally,  we  remark  that  the  flow  problem  discussed  here  can  also  be  modelled  by  a 
system  based  on  a  differential  constitutive  law  with  two  widely  spaced  relaxation  times 
but  no  Newtonian  viscosity  contribution  (see  system  (J SO 2)  in  Sec.  2.  of  [13]);  with  an 
appropriate  choice  of  relevant  parameters,  the  resulting  problem  exhibits  the  same  steady 
states  and  the  same  characteristics  as  ( J  SO ). 
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Fig.  1:  Total  steady  shear  stress  T  vs.  shear  strain  rate  vx  for 
steady  flow.  The  case  of  three  critical  points  is  illustrated;  other 
possibilities  are  discussed  in  Sec.  3. 


3.  Phase  Plane  Analysis  for  System  (JSO)  When  a  =  0 

When  or  is  not  zero,  numerical  simulation  developed  in  [9,  11,  12]  discovered  striking 
phenomena  in  shear  flow  and  suggested  the  analysis  that  follows.  A  great  deal  of  infor¬ 
mation  about  the  structure  of  solutions  of  system  (JSO)  can  be  garnered  by  studying 
a  quadratic  system  of  ordinary  differential  equations  that  approximates  it  in  a  certain 
parameter  range,  the  dynamics  of  which  is  determined  completely.  Motivation  for  this  ap¬ 
proximation  comes  from  the  following  observation:  in  experiments  of  Vinogradov  et  a /.  [19], 
a  is  of  the  order  10-12;  thus  the  term  avt  in  the  momentum  equation  of  system  (JSO) 
is  negligible  even  when  vt  is  moderately  large.  This  led  us  to  the  approximation  to  sys¬ 
tem  (JSO)  obtained  when  a  =  0. 

When  a  =  0,  the  momentum  equation  in  system  (JSO)  can  be  integrated  to  show 
that  the  toted  shear  stress  T  :=  a  +  svx  coincides  with  the  steady  value  T(x)  —  —fx.  Thus 
T  =  T(x)  is  a  function  of  x  only,  even  though  a  and  vx  are  functions  of  both  x  and  t.  The 
remaining  equations  of  system  (JSO)  yield,  for  each  fixed  x,  the  autonomous,  quadratic, 
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Fig.  2:  Velocity  profile  for  steady  flow. 


planar  system  of  ordinary  differential  equations 

*  =  (Z  +  1) 


(3.1) 


Here  the  dot  denotes  the  derivative  dfdt.  We  emphasize  that  for  each  /,  a  different 
dynamical  system  is  obtained  at  each  x  on  the  interval  [—1/2,0]  in  the  channel  because 
T  =  —  fx.  By  symmetry,  we  may  focus  attention  on  the  case  T  >  0;  also  recall  from  Sec.  2 
that  s  <  1/8;  these  are  assumed  throughout.  The  dynamical  system  (3.1)  can  be  analyzed 
completely  by  a  phase-plane  analysis  outlined  below;  the  reader  is  referred  to  Sec.  3  in  [13] 
for  further  details.  Here  we  state  the  maun  results. 

The  critical  points  of  system  (3.1)  satisfy  the  algebraic  system 


(3.2) 
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These  equations  define,  respectively,  a  hyperbola  and  a  parabola  in  the  <r-Z  plane;  these 
curves  are  drawn  in  Fig.  3,  which  corresponds  to  the  most  comprehensive  case  of  three 
critical  points.  The  critical  points  are  intersections  of  these  curves.  In  particular,  critical 
points  lie  in  the  strip  0  <  <r  <  T. 


Fig.  3:  The  phase  plane  in  the  case  of  three  critical  points. 


Eliminating  Z  in  these  equations  shows  that  the  cr-coordinates  of  the  critical  points 
satisfy  the  cubic  equation  Q(a/T)  =  0,  where 


Q( 0 


T2 

“£(£  -  1)  +  1  +  e 


(€-!)  +  «• 


(3.3) 


A  straightforward  calculation  using  Eq.  (2.4)  shows  that 

P(V.)  =  P  (p^)  -  ~~ Q(°-/r) .  (3.4) 

Thus  each  critical  point  of  the  system  (3.1)  defines  a  steady-state  solution  of  system  ( JSO ): 
such  a  solution  corresponds  to  a  point  on  the  steady  total-stress  curve  (see  Fig.  1)  at  which 
the  total  stress  is  T(x).  Consequently,  we  have: 
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Proposition  3  1: 

For  each  position  x  in  the  channel  and  for  each  e  >  0,  there  are  three  possibilities: 

(1)  there  is  a  single  critical  point  A  when  T  <  Tm; 

(2)  there  is  also  a  single  critical  point  C  ifT  >  Tm! 

(2)  there  are  three  critical  points  A,  B,  and  C  when  Tm  <  T  <  T m- 

For  simplicity,  we  ignore  the  degenerate  cases,  where  T  =  Tm  or  T  =  Tm,  in  which 
two  critical  points  coalesce. 

To  determine  the  qualitative  structure  of  the  dynamical  system  (3.1),  we  first  study 
the  nature  of  the  critical  points.  The  behavior  of  orbits  near  a  critical  point  depends 
on  the  linearization  of  system  (3.1)  at  this  point,  i.e.,  on  the  eigenvalues  of  the  Jacobian 
matrix  J  associated  with  Eq.  (3.1),  evaluated  at  the  critical  point.  To  avoid  solving  the 
cubic  equation  Q(cr/T)  =  0,  the  character  of  the  eigenvalues  of  J  can  be  determined  from 
the  signs  of  the  trace  of  J  denoted  by  TrJ,  the  determinant  of  J  denoted  by  Det  J,  and 
the  discriminant  of  J  denoted  by  Discrm  J  at  the  critical  points.  We  omit  these  tedious 
calculations,  a  result  of  which  is  a  useful  fact:  at  a  critical  point,  eDet  J  =  Q'(o/T).  This 
relation  is  important  because  Q'  is  positive  at  A  and  C  and  negative  at  B.  To  assist  the 
reader,  Fig.  3  shows  the  hyperbola  on  which  b  =  0,  the  parabola  on  which  Z  —  0  [see 
Eqs.  (3.2)],  and  the  hyperbola  on  which  Discrm  J  vanishes.  As  a  result  of  the  analysis 
above,  we  draw  the  following  conclusions: 

(1)  Tr  J  <  0  at  all  critical  points; 

(2)  Det  J  >  0  at  A  and  C,  while  Det  J  <  0  at  B\  and 

(3)  Discrm  J  >  0  at  A  and  B,  whereas  Discrm  J  can  be  of  either  sign  at  C.  (For  typical 
values  of  e  and  T,  Discrm  J  <  0  at  C;  in  particular,  DiscrmJ  <  0  if  C  is  the  only 
critical  point.  But  it  is  possible  for  DiscrmJ  to  be  positive  if  T  is  sufficiently  close  to 

Tm.) 

Standard  theory  of  nonlinear  planar  dynamical  systems  (see,  e.g.,  Ref.  [3,  Chap.  15])  now 
establishes  the  local  characters  of  the  critical  points  A,B,C  in  Proposition  3.1: 

Proposition  3.2: 

(1)  A  is  an  attracting  node  (called  the  classical  attractor); 

(2)  B  is  a  saddle  point; 

(2)  C  is  either  an  attracting  spiral  point  or  an  attracting  node  (called  the  spurt  attractor). 

The  next  task  is  to  determine  the  globed  structure  of  the  orbits  of  system  (3.1).  In 
this  direction,  we  modify  em  argument  suggested  by  A.  Coppel  [4]  and  establish  the  cru¬ 
cial  result,  the  proof  of  which  involves  a  change  in  the  time  scale  and  an  application  of 
Bendixson’s  theorem: 

Proposition  3.3: 

System  (2.1)  has  neither  periodic  orbits  nor  separatrix  cycles. 

To  understemd  the  global  queditative  behavior  of  orbits,  we  construct  suitable  invariemt 
sets.  In  this  regard,  a  crucied  tool  is  that  system  (3.1)  is  endowed  with  the  identity  (3.5) 

^  {<rJ  +  (Z +  !)*}  = -2  [<r!+(Z  +  l)2-}]  .  (3.5) 
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Thus  the  function  V(a,  Z)  :=  <r2+(Z+l)2  serves  as  a  Lyapunov  function  for  the  dynamical 
system.  Notice  that  identity  (3.5)  is  independent  of  T  and  e. 

Let  T  denote  the  circle  on  which  the  right  side  of  Eq.  (3.5)  vanishes,  and  let  Cr  denote 
the  circle  of  radius  r  centered  at  a  =  0  and  Z  =  —1,  i.e.  Cr  :=  {(cr,  Z)  :  V(<7,  Z)  =  r,  r  >  0}; 
each  Cr  is  a  level  set  of  V.  The  circles  T  and  C\  are  shown  in  Fig.  4,  which  corresponds 
to  the  case  of  a  single  critical  point,  the  spiral  point  C.  Eq.  (3.5)  also  implies  the  critical 
points  of  system  (3.1)  lie  on  T.  If  r  >  1,  T  lies  strictly  inside  Cr.  Consequently,  Eq.  (3.5) 
shows  that  the  dynamical  system  (3.1)  flows  inward  at  points  along  CT.  Thus  the  interior 
of  CT  is  a  positively  invariant  set  for  each  r  >  1.  Furthermore,  the  closed  disk  bounded  by 
Cj,  which  is  the  intersection  of  these  sets,  is  also  positively  invariant.  Therefore  the  above 
argument  establishes: 

Proposition  3.4:  Each  closed  disk  bounded  by  the  circle  Cr,  r  >  1  is  a  positively 
invariant  set  for  the  system  (3. 1). 

The  above  results  combined  with  identification  of  suitable  invariant  sets  were  used  to 
determine  the  global  structure  of  the  orbits  of  system  (3.1)  in  the  cases  of  one  and  three 
critical  points,  and  to  analyze  the  stable  and  unstable  manifolds  of  the  saddle  point  at  B. 
These  results  are  shown  in  Figs.  5  and  6  and  summerized  in  the  following  result. 

Proposition  3.5: 

The  basin  of  attraction  of  A,  i.e.,  the  set  of  points  that  Sow  toward  A  as  t  —*  oo,  comprises 
those  points  on  the  same  side  of  the  stable  manifold  of  B  as  is  A;  points  on  the  other  side 
are  in  the  basin  of  attraction  of  C.  Moreover,  the  arc  of  the  circle  V  through  the  origin, 
between  B  and  its  reSection  B'  is  contained  in  the  basin  of  attraction  of  A.  In  particular, 
the  stable  manifold  for  B  cannot  cross  its  boundary,  so  that  it  cannot  cross  T  between  B 
and  B'. 


All  qualitative  features  of  the  dynamics  of  system  (3.1)  (except  possibly  whether  C  is  a 
node  or  a  focus)  carry  over  to  one  that  approximates  the  system  (JSO 2)  in  the  case  of  two 
widely  separated  relaxation  times  ( see  system  (4-3)  in  [13]). 
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Fig.  4:  The  phase  plane  when  the  spurt  attractor  C  is  the  only 
critical  point. 

4.  Qualitative  Features  of  ( JSO )  Based  on  Phase  Plane  Analysis 

The  discussion  that  follows  sketches  an  explanation  of  recent  numerical  simulations  of 
(JSO)  described  in  Refs.  (9,  12].  These  exhibited  several  effects  related  to  spurt:  latency, 
shape  memory,  and  hysteresis.  Fig.  7  shows  the  result  of  simulating  a  “quasi-static” 
loading  sequence  in  which  the  pressure  gradient  /  is  increased  in  small  steps,  allowing 
sufficient  time  between  steps  to  achieve  steady  flow  [9].  The  loading  sequence  is  followed 
by  a  similar  quasi-static  unloading  sequence,  in  which  the  driving  pressure  gradient  is 
decreased  in  steps.  The  initial  step  used  zero  initial  data,  and  succeeding  steps  used  the 
results  of  the  previous  step  as  initial  data.  The  resulting  hysteresis  loop  includes  the  shape 
memory  predicted  by  Hunter  and  Slemrod  [7]  for  a  simpler  model  by  a  different  approach. 
The  width  of  the  hysteresis  loop  at  the  bottom  can  be  related  directly  to  the  molecular 
weight  of  the  sample  [9]. 

We  explain  spurt,  shape  memory,  hysteresis  and  latency.  We  consider  experiments 
of  the  following  type:  the  flow  is  initially  in  a  steady  state  corresponding  to  a  forcing 
/0,  and  the  forcing  is  suddenly  changed  to  /  =  f0  +  A/.  We  call.  this  process  “loading” 
(resp.  “unloading”)  if  A /  has  the  same  (resp.  opposite)  sign  as  f0.  The  initial  flow  can 
be  described  by  specifying,  for  each  channel  position  x,  whether  the  flow  is  at  a  classical 
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Fig.  5:  The  orbit  through  origin  when  the  spurt  attractor  C  is  the 
only  critical  point. 

attractor  A  (i  is  a  “classical  point”)  or  a  spurt  attractor  C  (x  is  a  “spurt  point”)  for  the 
system  (3.1)  with  T  =  —f0x.  We  shall  say  that  any  point  lying  on  the  same  side  of  the 
stable  manifold  of  B  as  is  A  lies  on  the  “classical  side”;  points  lying  on  the  other  side  are 
said  to  be  on  the  “spurt  side.”  The  outcome  of  the  experiment  depends  on  the  character  of 
the  phase  portrait  with  X  =  —  fx.  To  determine  this  outcome,  we  need  only  decide  when 
a  classical  point  becomes  a  spurt  point  or  vice  versa. 

The  principle  mathematical  properties  of  the  dynamical  system  (3.1)  that  determine 
the  outcome  of  loading  and  unloading  experiments  axe  embodied  in  the  following  conse¬ 
quence  of  the  phase  plane  analysis. 

Proposition  4.1: 

(1)  A  classical  point  Aq  for  the  initial  forcing  f0  lies  in  the  domain  of  attraction  of  the 
classical  attractor  A  for  f,  provided  that  A  exists  (i.e.,  |/x|  <  Tm); 

(2)  A  spurt  point  Co  for  the  initial  forcing  /„  lies  in  the  domain  of  attraction  of  the  spurt 
attractor  C  for  f  unless  (a)  C  does  not  exist  (i.e.,  \fx j  <  T m);  or  (b)  C  lies  on  the 
classical  side  of  the  stable  manifold  of  the  saddle  point  B  for  f. 

Consider  starting  with  f0  =  0  and  loading  to  /  >  0.  Thus  the  initial  state  for  each  x 
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Fig.  6:  Phase  portrait  in  the  case  of  three  critical  points,  with  C 
being  a  spiral. 

lies  at  the  origin  <r  =  0,  Z  =  0.  Then  according  to  4.1(1)  above,  each  x  6  [—1/2,0]  such 
that  /|x|  <  T\f  is  a  classical  point,  while  the  x  for  which  /|x|  >  Tm  are  spurt  points 
(because  there  is  no  classical  attractor).  Consequently,  we  draw  two  conclusions: 

Proposition  4.2: 

(a)  If  the  forcing  is  subcritical  (i.e.,  f  <  /crit  :=  2T m ),  the  asymptotic  steady  Bow  is 
entirely  classical. 

(b)  If  the  forcing  is  supercritical  (f _>  f^n),  there  is  a  single  kink  in  the  velocity  proBle 
(see  Fig.  2),  located  at  x„  =  —Tm/ f',  those  x  €  [— 1/2,  x„),  near  the  wall,  are  spurt 
points,  whereas  x  €  (x.,0],  near  the  centerline,  are  classical. 

The  solution  in  case  (b)  can  be  described  as  “top  jumping”  because  the  stress  T,  =  Tm 
at  the  kink  is  as  large  as  possible,  and  the  the  kink  is  located  as  close  as  possible  to  the 
wall. 

_Next,  consider  increasing  the  load  from  /0  >  0  to  /  >  /0.  A  point  x^that  is  classical 
for  /0  remains  classical  for  /  unless  there  is  no  classical  attractor  for  T  =  —  fx,  i.e., 
f\x\  >  Tm-  A  spurt  point  x  for  /0,  on  the  other  hand,  is  always  a  spurt  point  for  /.  As 
a  result,  a  point  in  x  in  the  channel  can  change  only  from  a  classical  attractor  to  a  spurt 
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Fig.  7:  Hysteresis  under  cyclic  load:  normalized  throughput  6Q 
vs.  wall  shear  stress  Xws jj  [9]. 

attractor,  and  then  only  if  f\x\  exceeds  T m.  When  /  is  chosen  to  be  supercritical,  loading 
causes  the  position  x,  of  the  kink  in  Fig.  2  to  move  away  from  the  wall,  but  only  to  the 
extent  that  it  must:  a  single  jump  in  strain  rate  occurs  at  x,  =  —T m/ f,  where  the  total 
stress  is  T,  =  Tm.  These  conclusions  are  valid,  in  particular,  for  a  quasi-static  process  of 
gradually  increasing  the  load  from  /0  =  0  to  /_>  fsi it-  _ 

Now  consider  unloading  from  /0  >  0  to  /  <  /0;  assume,  for  the  moment,  that  /  is 
positive.  Here,  the  initial  steady  solution  need  not  correspond  to  top  jumping.  For  this 
type  of  unloading,  a  point  x  that  is  classical  for  /0  always  remains  classical  for  /:  the 
classical  attractor  for  /  exists  because  / |xj  <  /0|x|.  By  contrast,  a  spurt  point  x  for  f0 
can  become  classical  at  /.  This  occurs  if:  (a)  the  total  stress  T  =  —  fx  falls  below  Tm;  or 
(b)  the  spurt  attractor  Co  for  T  =  —  /0x  lies  on  the  classical  side  of  the  stable  manifold  of 
the  saddle  point  B  for  T  =  —  fx  (see  Proposition  4.1(2b)). 

Combining  the  analysis  of  loading  and  unloading  leads  to  the  following  summary  of 
quasi-static  cycles  and  the  resulting  flow  hysteresis. 

Kinks  move  away  from  the  wall  under  top  jumping  loading;  they  move  toward  the  wall 
under  bottom  j  '.mpir.g  unloading;  otherwise  they  remain  fixed.  The  hysteresis  loop  opens 
from  the  point  at  wnich  unloading  commences;  no  part  of  the  unloading  path  retraces  the 
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loading  -path  until  point  d  in  Fig.  7. 

To  explain  the  latency  effect  that  occurs  during  loading,  assume  that  e  is  small.  It  is 
readily  seen  that  the  total  stress  Tm  at  the  the  local_  maximum  M  is  1/2  +  0(e),  while 
the  local  minimum  m  corresponds  to  a  total  stress  Tm  of  2>/e[l  +  0(e)].  Furthermore, 
for  x  such  that  T(x )  =  0(1),  a  =  T  +  0(e)  at  an  attracting  node  at  A,  while  a  =  0(e) 
at  a  spurt  attractor  C  (which  is  a  spiral).  Consider  a  point  along  the  channel  for  which 
T(z)  >  Tm,  so  that  the  only  critical  point  of  the  system  (3.1)  is  C,  and  suppose  that  that 
T  <  1.  Then  the  evolution  of  the  system  exhibits  three  distinct  phases,  as  indicated  in 
Fig.  6:  an  initial  “Newtonian”  phase  (O  to  IV);  an  intermediate  “latency”  phase  ( N  to  S ); 
and  a  final  “spurt”  phase  ( S  to  C). 

The  Newtonian  phase  occurs  on  a  time  scale  of  order  e,  during  which  the  system 
approximately  follows  an  arc  of  a  circle  centered  at  <r  =  0  and  Z  =  —  1.  Having  assumed 
that  T  <  1,  Z  approaches 

Ziv  =  (l-T2)*-1  (4.1) 

as  <r  rises  to  the  value  T.  (If,  on  the  other  hand,  T  >  1,  the  circular  arc  does  not  extend 
as  far  as  T,  and  cr  never  attains  the  value  T;  rather,  the  system  slowly  spirals  toward  the 
spurt  attractor.  Thus  the  dynamical  behavior  does  not  exhibit  distinct  phases.) 

The  latency  phase  is  characterized  by  having  a  =  T+0(e),  so  that  cr  is  nearly  constant 
and  Z  evolves  approximately  according  to  the  differential  equation 


T 2 

Z  +  l 


-Z  . 


(4.2) 


Therefore,  the  shear  stress  and  velocity  profiles  closely  resemble  those  for  a  steady  solution 
with  no  spurt,  but  the  solution  is  not  truly  steady  because  the  normal  stress  difference 
Z  still  changes.  Integrating  Eq.  (4.2)  from  Z  =  Z\ y  to  Z  —  —  1  determines  the  latency 
period.  This  period  becomes  indefinitely  long  when  the  forcing  decreases  to  its  critical 
value;  thus  the  persistence  of  the  near-steady  solution  with  no  spurt  can  be  very  dramatic. 
The  solution  remains  longest  near  point  L  where  Z  —  —  1  4-  T.  This  point  may  be  regarded 
as  the  remnant  of  the  attracting  node  A  and  the  saddle  point  B.  Eventually  the  solution 
enters  the  spurt  phase  and  tends  to  the  critical  point  C.  Because  C  is  an  attracting  spiral, 
the  stress  oscillates  between  the  shear  and  normal  components  while  it  approaches  the 
steady  state. 

Asymptotic  analysis  carried  out  in  Sec.  6  of  [13]  shows  that  when  e  is  sufficiently 
small,  system  ( JSOi )  of  [13]  has  the  same  asymptotic  properties  as  system  ( JSO ).  Thus 
system  (JSO)  approximates  (JSO2)  quantitatively  as  well  as  qualitatively. 
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5.  Physical  Implications 

One  of  the  widely  accepted  explanations  of  spurt  and  similar  observations  is  that  the 
presence  of  the  wall  affects  the  dynamics  of  the  polymer  system  near  the  wall.  Conceivably, 
there  could  be  a  variety  of  “wall  effects,”  the  most  obvious  is  the  loss  of  chemical  bond 
between  wall  and  fluid,  or  wall  slip  [5].  Perhaps  the  most  distinguishing  feature  of  our 
alternative  approach  is:  it  predicts  that  spurt  stems  from  a  material  property  of  the 
polymer  and  is  not  related  to  any  external  interaction.  The  spurt  layer  forms  at  the  wall 
in  situations  such  as  top  jumping  because  the  stresses  are  higher  there;  for  the  same  reason, 
of  course,  is  chemical  bonds  would  break  at  the  wall;however,  our  approach  predicts  that 
the  layer  of  spurt  points  spreads  into  the  interior  of  the  channel  on  continued  loading. 
Layer  thickness  is  predicted  to  grow  continuously  in  loading  to  a  thickness  that  should  be 
observable,  provided  secondary  (two-dimensional)  instabilities  do  not  develop. 

Our  analysis  suggests  other  ways  in  which  experiments  might  be  devised  to  verify 
the  dependence  of  spurt  on  material  properties:  (i)  produce  multiple  kinks  with  spurt 
layer  separated  from  the  wall,  (ii)  produce  hysteresis  in  flow  reversal  (Fig.  9).  Our  model 
predicts  circumstances  under  which  a  different  path  can  be  followed  in  sudden  reversal  of 
the  flow  than  would  be  followed  by  a  sequence  of  solutions  in  which  the  pressure  gradient 
is  reduced  to  zero  and  reloaded  again  (with  the  opposite  sign)  to  a  value  of  somewhat 
smaller  magnitude.  Such  behavior  does  not  seem  likely  to  be  explainable  by  a  wall  effect. 

The  most  important  and  perhaps  the  easiest  experiment  to  perform  to  verify  our  the¬ 
ory  is  to  produce  latency.  Our  analysis  predicts  long  latency  times  for  data  corresponding 
to  realistic  material  data;  no  sophisticated  timing  device  would  be  required,  nor  would  the 
onset  of  the  instability  be  hard  to  identify.  The  increase  in  throughput  is  predicted  to  be 
so  dramatic  that  simple  visual  inspection  of  the  exit  flow  would  probably  be  sufficient. 

6.  Conclusions 

Although  our  analysis  applies  only  to  the  special  constitutive  models  we  have  studied, 
we  expect  that  the  qualitative  features  of  our  results  appear  in  a  broad  class  of  non- 
Newtonian  fluids.  Our  analysis  has  identified  certain  universal  mathematical  features  in 
the  shear  flow  of  viscoelastic  fluids  described  by  differential  constitutive  relations  that 
give  rise  to  spurt  and  related  phenomena.  The  key  feature  is  that  there  are  three  widely 
separated  time  scales,  each  associated  with  an  important  non-dimensional  number  (a,  e, 
and  1,  respectively),  when  scaled  by  the  dominant  relaxation  time,  A-1.  Each  of  these 
time  scales  can  be  associated  with  a  particular  equation  in  system  ( JSO )  [13].  The  key 
to  understanding  the  dynamics  of  such  systems  is  fixing  the  location  of  the  discontinuity 
in  the  strain  rate  induced  by  the  non- monotone  character  of  the  steady  shear  stress  vs. 
strain  rate. 

Acknowledgments 

We  thank  Professor  A.  Coppel  for  suggesting  an  elegant  argument  that  rules  out  the 
existence  of  periodic  and  separatrix  cycles  for  the  systems  (3.1).  We  also  acknowledge 
helpful  discussions  with  D.  Aronson,  M.  Denn,  G.  Sell,  M.  Slemrod  and  A.  Tzavaras,  and 
M.  Yao. 


146 


References 


1.  A.  Andronov  and  C.  Chaikin,  Theory  of  Oscillations,  Princeton  Univ.  Press,  Prince¬ 
ton,  1949. 

2.  R.  Bird,  R.  Armstrong,  and  O.  Hassager,  Dymunics  of  Polymeric  Liquids,  John  Wiley 
and  Sons,  New  York,  1987. 

3.  E.  Coddington  and  N.  Levinson,  Theory  of  Ordinary  Differential  Equations ,  McGraw- 
Hill,  New  York,  1955. 

4.  A.  Coppel,  ,  1989.  private  communication. 

5.  M.  Denn,  “Issues  in  Viscoelastic  Fluid  Dynamics,”  Annual  Reviews  of  Fluid  Mechan¬ 
ics,  1989.  to  appear. 

6.  M.  Doi  and  S.  Edwards,  “Dynamics  of  Concentrated  Polymer  Systems,”  J.  Chem. 
Soc.  Faraday  74  (1978),  pp.  1789-1832. 

7.  J.  Hunter  and  M.  Slemrod,  “Viscoelastic  Fluid  Flow  Exhibiting  Hysteretic  Phase 
Changes,”  Phys.  Fluids  26  (1983),  pp.  2345-2351. 

8.  M.  Johnson  and  D.  Segalman,  “A  Model  for  Viscoelastic  Fluid  Behavior  which  Allows 
Non-Affine  Deformation,”  J.  Non-Newtonian  Fluid  Mech.  2  (1977),  pp.  255-270. 

9.  R.  Kolkka,  D.  Malkus,  M.  Hansen,  G.  Ierley,  and  R.  Worthing,  “Spurt  Phenomena 
of  the  Johnson- Segalman  Fluid  and  Related  Models,”  J.  Non-Newtonian  Fluid  Mech. 
29  (1988),  pp.  303-325. 

10.  R.  Kolkka  and  G.  Ierley,  “Spurt  Phenomena  for  the  Giesekus  Viscoelastic  Liquid 
Model,”  J.  Non-Newtonian  Fluid  Mech.,  1989.  To  appear. 

11.  D.  Malkus,  J.  Nohel,  and  B.  Plohr,  “Time-Dependent  Shear  Flow  Of  A  Non-Newtonian 
Fluid,”  in  Conference  on  Current  Problems  in  Hyberbolic  Problems:  RiemannProb- 
lems  and  Computations  (Bowdoin,1988),  ed.  B.  Lindquist,  Amer.  Math.  Soc.,  Provi¬ 
dence,  1989.  Contemporary  Mathematics,  to  appear. 

12.  D.  Malkus,  J.  Nohel,  and  B.  Plohr,  “Dynamics  of  Shear  Flow  of  a  Non-Newtonian 
Fluid,”  J.  Comput.  Phys.,  1989.  To  appear. 

13.  D.  Malkus,  J.  Nohel,  and  B.  Plohr,  “Analysis  of  New  Phenomena  In  Shear  Flow  of 
Non-Newtonian  Fluids,”  SIAM  J.  Appl.  Math.,  1989.  Submitted. 

14.  T.  McLeish  and  R.  Ball,  “A  Molecular  Approach  to  the  Spurt  Effect  in  Polymer  Melt 
Flow,”  J.  Polymer  Sci.  24  (1986),  pp.  1735-1745. 

15.  J.  Nohel,  R.  Pego,  and  A.  Tzavaras,  “Stability  of  Discontinuous  Steady  States  in 
Shearing  Motions  of  Non-Newtonian  Fluids.”  Proc.  Roy.  Soc.  Edinburgh,  Series  A. 
1989.  submitted. 


16.  J.  Oldroyd,  “Non-Newtonian  Effects  in  Steady  Motion  of  Some  Idealized  Elastico- 
Viscous  Liquids,”  Proc.  Roy.  Soc.  London  A  245  (1958),  pp.  278-297. 

17.  J.  Peaxson,  Mechanics  of  Polymer  Processing,  Elsevier  Applied  Science,  London,  1985. 

18.  M.  Renardy,  W.  Hrusa,  and  J.  Nohel,  Mathematical  Problems  in  Viscoelasticity, 
Pitman  Monographs  and  Surveys  in  Pure  and  Applied  Mathematics,  Vol.  35,  Longman 
Scientific  &  Technical,  Essex,  England,  1987. 

19.  G.  Vinogradov,  A.  Malkin,  Yu.  Yanovskii,  E.  Borisenkova,  B.  Yaxlykov,  and  G.  Berezh- 
naya,  “Viscoelastic  Properties  and  Flow  of  Narrow  Distribution  Poly  butadienes  and 
Polyisoprenes,”  J.  Polymer  Sci.,  Part  A-2  10  (1972),  pp.  1061-1084. 

20.  M.  Yao  and  D.  Malkus,  “Analytical  Solutions  of  Plane  Poiseuille  Flow  of  a  Johnson- 
Segalman  Fluid,”  in  preparation,  1989. 


148 


SMART  ALGORITHMS  FOR  COMPLEX 
PROBLEMS  IN  FLUID  DYNAMICS1 


J.  Tinsley  Oden2 


Abstract 


Some  recent  results  obtained  using  adaptive  finite  element  methods  and  so-called  smart  algorithms 
in  two-  and  three-dimensional  problems  in  fluid  mechanics  are  discussed.  These  include  applies 
tions  of  h-  and  h-p-adaptive  methods  on  unstructured  meshes. 


lThe  present  manuscript  is  extracted  from  the  full  paper  on  this  subject  that  is  to  appear  in  Computers 
and  Structures  in  a  special  volume  compiled  from  the  Symposium  on  Frontiers  in  Computational  Mechanics, 
held  at  March  1989,  in  honor  of  Professor  T.H.H.  Pian  on  the  occasion  of  his  seventieth  birthday. 
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1  Introduction 


The  significant  challenges  of  numerical  simulation  of  complex  phenomena  in  fluid  dynamics 
have  encouraged  the  development  of  new  and  innovative  methods  for  producing  good  compu¬ 
tational  results  as  efficiently  as  possible  with  existing  computing  capabilities.  Chief  among 
these  are  adaptive  methods,  which  attempt  to  adapt  the  mesh  size  and  topology  or  the  spec¬ 
tral  order  of  the  approximation  so  as  to  yield  acceptable  results  with  minimum  numbers  of 
grid  points  or  degrees  of  freedom.  In  this  paper,  some  recent  results  are  surveyed.  Some  of 
the  results  described  are  also  discussed  in  the  conference  paper  [1,2];  others  are  new  results 
obtained  using  recently  completed  three-dimensional  adaptive  h- codes  and  h-p  codes. 


2  Brief  Review  of  Adaptive  Methods 

Adaptive  methods  in  computational  mechanics  are  generally  based  on  a  simple  idea:  when 
the  error  ina  computation  is  too  large,  change  the  structure  of  the  approximation  (the  mesh 
size,  the  location  of  grid  points,  the  order  of  the  approximation,  etc.)  to  reduce  it.  Interest 
in  such  procedures  has  grown  gradually  in  recent  years  with  the  realization  that  they  may 
embody  ways  to  optimize  computations  —  to  deliver  the  best  answers  in  some  sense  for  the 
least  effort.  However,  implementation  of  the  adaptive  idea  constitutes  a  significant  departure 
from  conventional  methods  in  CFD  and  involves  many  open  problems.  For  instance,  the  very 
notion  that  one  attempts  to  reduce  error  implies  that  the  error  is  known  or  can  be  estimated 
in  some  sense.  Thus,  the  first  step  in  adaptivity  is  to  develop  measures  of  “goodness”  of 
solutions,  and  such  measures  may  range  from  ad  hoc  checks  of  solution  gradients  to  rigorous 
a  posteriori  error  estimates.  While  progress  in  a  posteriori  error  estimation  has  been  made 
in  recent  months,  this  subject  remains  an  area  of  active  research. 

Having  an  estimate  of  the  error  in  the  solution  at  a  grid  cell,  what  can  one  do  to  sys¬ 
tematically  reduce  it  below  some  preset  level?  In  general,  one  can  refine  the  mesh  size  h 
(h-methods  of  adaptivity),  increase  the  density  of  grid  points  by  relocating  nodes  (r-methods 
of  adaptivity),  increase  the  local  spectral  order  p  of  the  approximation  (p- methods  of  adap¬ 
tivity),  or  use  combinations  of  these  techniques  (e.g.,  h-p  techniques).  Each  of  these  choices 
puts  new  demands  on  the  overall  approach  to  the  computational  problem.  In  particular, 
adaptive  techniques  (1)  must  generally  function  on  unstructured  meshes,  (2)  require  elab¬ 
orate  and  complicated  data  structures,  (3)  employ  explicit  or  iterative  solution  techniques 
since  direct  solvers  are  of  limited  value  on  dynamically  evolving  unstructured  meshes,  (4) 
cope  with  special  issues  of  stability  of  numerical  schemes  that  must  function  with  continually 
changing  structures  and  orders,  and  (5)  attempt  to  minimize  the  computational  overhead 
of  the  error  estimation  and  of  implementation  of  the  adaptive  process  itself.  These  are  the 
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major  challenges  of  adaptive  methods  in  computational  fluid  dynamics. 

In  recent  months,  we  have  attempted  to  meet  these  challenges  through  a  series  of  studies 
on  each  of  the  above  issues.  Some  of  our  results  have  been  implemented  in  a  collection  of 
Navier-Stokes  solvers  that  we  refer  to  as  the  ADAPT ™  code;  other  results  deal  with  new 
data  structures  and  adaptive  methods  and  require  further  work  before  they  represent  effective 
tools  for  complex  flow  simulations.  Some  of  our  results  on  special  issues  of  adaptivity  and 
on  selected  applications  are  briefly  summarized. 


3  Local  Approximation  of  the  Navier  Stokes  Equa¬ 
tions  on  Unstructured  Meshes 


Most  of  the  applications  discussed  here  pertain  to  numerical  solutions  of  the  compressible 
Navier-Stokes  equations.  In  three  dimensions,  without  body  forces  or  external  heat  sources, 
these  can  be  written, 

ATT  AF.  AF  Art 

(3.1) 
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where  U,  E,  F,  and  G  are  vectors  and  S  is  a  matrix  of  stresses,  power,  and  heat  flux, 
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Here  p  is  the  total  mass  density,  u,  v  and  w  are  the  velocity  components,  p  is  the  fluid 
pressure,  are  the  components  of  the  viscous  stresses,  e  is  the  total  energy  defined  by 
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e  =  e  +  j(u2  +  v2  +  tv2)  where  e  is  the  thermodynamic  internal  energy  per  unit  mass,  and  q 
is  the  heat  flux  vector. 

The  constitutive  relation  used  to  evaluate  the  viscous  stress  rtJ  is  given  by 

Tij  =  +  «i,.]  + 

where  p  and  A  are  the  first  and  second  coefficients  of  viscosity,  u.j  are  the  components  of  the 
velocity  gradient  (u,j  =  dui/dxj ,  x\  =  x,  xi  =  y,  13  =  z),  and  6{j  is  the  Kronecker  delta. 

In  addition  to  the  partial  differential  equations  above,  two  thermodynamic  relations  are 
also  needed  to  close  the  system  of  equations.  These  relations  are  the  ideal-gas  state  equation 

P  =  (7  -  1  )pe  (3-5) 

and  an  equation  which  relates  the  temperature  to  the  internal  energy 


L 


e  =  CyT 


(3.6) 


Here  7  is  the  specific  heat  ratio  and  c*  is  the  specific  heat  at  constant  volume.  With  these 
two  additional  equations  we  now  have  a  complete  system  which  can  be  solved  for  the  vector 
of  unknown  quantities  (p,  u,  v,  w,  e)  and  for  p  and  T. 

For  the  class  of  problems  considered  here,  a  weak  formulation  is  defined  in  terms  of  two 
classes  of  functions:  V,  the  class  of  trial  functions,  to  which  the  solution  U  belongs,  and 
W,  the  class  of  test  (or  weight)  functions  which  are  integrated  against  the  residual  of  the 
governing  equations.  The  resulting  weak  form  is: 

Find  U  in  a  class  V  such  that 


(T  huj<t>-  ET<t>,  -  FT <f>  -  GT4>t)d^dt 

Jo  J  n 

=  [T  [  S(U )  :  V<txmdt  +  f  I  otT<t>ds  dt 
Jo  Ja  Jo  Jan 


(3.7) 


for  all  test  functions  4>  =  [4>\,4>i,  *  •  • ,  <^5]  in  W:  where  [0,  T]  is  the  time  interval  of  interest, 
fi  is  the  region  through  which  the  fluid  moves,  d(l  is  the  boundary  of  the  flow  region  fl,  and 
a  is  the  vector  of  boundary  fluxes.  It  is  understood  that  the  viscous  stress  terms  on  the 
right-hand  side  of  (3.7)  may  also  appear  in  the  integrated  form, 

i  (/„  + L  wm*)  di 


so  that  differentiability  of  7\;  in  L1  is  not  necessarily  required. 

Our  numerical  approximation  of  the  flow  problem  will  begin  with  a  discrete  approxima¬ 
tion  of  the  alternate  weak  form  for  a  time  interval  [<x,  <2]: 


152 


(3.8) 


Find  U  =  U(x,t)  €  V  such  that 
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for  all  <}>  €  W 

Here,  <f>  —  {<j> i, ^2,  •  •  • ,  ^s}r,  dQ  =  dx  =  dxidx2di3,  <f>a  =  *  €  H(<),  a  =  1, 2,  •  •  • , 5, 

d>t  =  d<f>/dt ,  and  n  is  an  outward  unit  normal  vector  to  the  boundary.  Also,  here  Q  is  the 
convective  flux  Q(U )  =  (E,  F,  G).  It  is  easily  verified  that  (3.8)  is  equivalent  to  the  entire 
system  of  Navier-Stokes  equations,  Rankine-Hugoniot  jump  conditions  (when  S  =  O),  and 
initial  conditions  on  U  (at  t  =  t1)  whenever  C7  is  a  C°- function  everywhere  except  at  surfaces 
of  discontinuity  where  the  jump  conditions  hold. 

In  a  strictly  formal  way,  the  finite  element  approximation  of  the  flow  is  obtained  from 
the  weak  statement  of  the  conservation  laws,  by  interpreting  ft  as  a  quadrilateral  or  brick 
element,  replacing  U  by  the  discrete  approximation  Uh  and  replacing  the  test  functions  <p 
by  the  discrete  functions  d>h. 


4  Various  Adaptive  Methods  and  Data  Structures 

Various  Adaptive  Strategies.  As  is  well  known,  several  distinctly  different  adaptive 
str;  tegies  for  CFD  problems  have  emerged  over  the  last  several  years.  We  classify  them  as 
follows: 

r-methods  (or  moving  finite  element  methods.  These  methods  “relocate”  grid 
points  in  a  mesh  so  that  the  grid  density  is  large  in  regions  of  high  error.  Here,  in 
general,  a  fixed  number  of  elements  and  nodes  is  used.  These  classes  of  methods  include 
those  designed  to  merely  enhance  orthogonality  and  smoothness  of  grids,  reduce  the 
L2—  error  in  residuals,  or  to  equidistribute  the  error. 

h-methods.  These  methods  involve  automatic  refinement  of  mesh  sizes  h.  Data 
structures  for  ^-refinement  vary  in  detail;  a  critique  and  survey  of  such  data  structures 
was  given  by  Demkowicz  and  Oden  [3]. 
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p-methods.  The  p-methods  involve  the  adaptive  enrichment  of  the  spectral  order 
of  the  approximations  over  subdomains  in  a  fixed  grid.  The  p-methods  are  closely 
related  to  the  spectral  element  methods  of  Patera  (e.g.  [4])  and  have  been  extensively 
developed  in  the  solid  mechanics  literature  by  Szabo  and  his  collaborations  (e.g.  [5]). 
Adaptive  p-methods  for  two  dimensional  Navier-  Stokes  equations  were  first  presented 
in  [6]. 

Combined  methods.  The  most  effective  techniques  generally  involve  a  combination 
of  h  and  r  or  h  and  p  techniques.  However,  the  complexity  of  data  structures  for  some 
combined  adaptive  methods  can  be  substantial  [7]. 


The  h-r  adaptive  strategy  is  regarded  here  as  primarily  a  preprocessing  technique,  wherein 
nodes  are  positioned  in  an  initial  mesh  to  align  gridlines  with  special  flow  features  such  as 
shocks,  boundary-layers,  etc.  Then,  we  superimpose  on  a  pre-processed  r-grid  a  full  h  or 
h-p  adaptive  scheme.  We  describe  in  the  next  section  a  simple  r-adaptive  strategy  adequate 
for  preprocessing  and  a  general  h-p  scheme  for  two-dimensional  problems. 


4.1  An  /i- Refinement /Unrefinement  Strategy. 


One  /i-procedure  involves  the  following  steps: 

1.  For  a  given  domain  fi,  a  coarse  finite  element  mesh  is  constructed  which  contains  only 
a  number  of  elements  sufficient  to  model  basic  geometrical  features  of  the  flow  domain. 

2.  As  our  adaptive  process  will  be  designed  to  handle  groups  of  four  elements  at  a  time 
(for  the  two-dimensional  case),  we  may  generate  a  finer  starting  grid  by  a  bisection 
process  to  obtain  an  initial  set  of  element  groups. 

3.  We  initiate  the  numerical  solution  procedures  o..  this  initial  coarse  grid,  and  compute 
error  indicators  0e  over  all  M  elements  in  the  grid.  Let 

0max  =  max  9e 

\<t<M 

4.  Next,  we  scan  groups  of  a  fixed  number  P  of  elements  and  compute 

p 

^group  =  O'" 

k=  I 

where  ek  is  the  element  for  group  k.  We  take  P  =  4  in  our  current  code. 
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5.  Error  tolerances  are  defined  by  two  real  numbers,  0  <  a,  0  <  1.  If 

9e  >  /?#MAX 

we  refine  element  9e.  This  is  done  by  bisecting  9e  into  four  new  subelements.  If 

^GROUP  —  °^MAX 

we  unrefine  the  group  k  by  replacing  this  group  with  a  single  new  element  with  nodes 
coincident  with  the  corner  nodes  of  the  group. 

This  general  process  can  be  followed  for  any  choice  of  an  error  indicator.  Moreover,  it  can  also 
be  implemented  at  each  time  step.  Three-dimensional  generalizations  are  straightforward 
with  eight  brick  elements  constituting  a  group. 

One  possible  adaptive  scheme  for  time-dependent  problems  is: 

1.  Advance  the  solution  N  time  steps  A t  using  an  appropriate  time-marching  scheme. 

2.  Calculate  error  estimates. 

3.  Refine  the  mesh. 

4.  Redo  the  N  time-step  calculations  using  the  new  refined  mesh. 

5.  Redo  the  error  estimation. 

6.  Unrefine  the  mesh. 

7.  Go  to  1. 

There  are  several  rather  obvious  alternative  versions  of  this  algorithm,  but  this  is  the  ap¬ 
proach  used  in  the  sample  calculations  presented  later  in  this  paper. 

The  /i-methods  used  in  all  calculations  reported  here  use  1-irregular  refined  meshes.  Full 
details  of  these  types  of  ^-refinement  strategies  are  discussed  in  [3], 

A  p-Method 

The  idea  of  increasing  the  order  of  an  approximation  while  keeping  mesh  sizes  fixed  is  a 
natural  one  in  the  case  of  problems  with  thin  boundary  layers  or  singularities.  In  results  to 
be  outlined  later,  we  employ  a  hierarchical  p-version  of  the  finite  element  method.  The  idea 
is  to  choose  element  shape  functions  of  the  form 

=  £x.(Oxj(» i) 
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where 

Xi  =  polynomial  of  degree  <  p  in  £  £  [—1)1] 

These  polynomials  have  hierarchical  structure,  which  ensures  the  property  that  the  el¬ 
ement  matrices  corresponding  to  an  approximation  of  degree  p  contain  as  proper  subma¬ 
trices  all  of  those  element  matrices  corresponding  to  approximations  of  degree  less  than  p. 
For  a  two-dimensional  quadrilateral  element,  the  degrees  of  freedom  are  the  nodal  values 
u,, i  =  1,2, 3, 4  at  the  vertices,  the  tangential  derivatives  d*u/ drk,  k  =  1,2, ...,p  at  the 
midsides,  and  mixed  derivatives  dmu/ d^drf ,  i  +  r  =  m  =  1,2, ...  ,p  at  the  centroid. 

To  fix  ideas,  consider  first  the  one-dimensional  case.  In  the  classical  FEM  (e.g.,  La¬ 
grange  interpolation),  shape  functions  for  various  order  of  approximation  are  constructed 
independently.  For  example,  passing  from  a  linear  element  with  two  linear  shape  functions 
to  a  quadratic  element,  we  construct  the  three  quadratic  shape  functions  independently  of 
the  shape  functions  for  the  linear  element.  An  alternative  way  to  construct  the  same  second 
order  approximation  is  to  complete  the  set  of  two  linear  shape  functions  by  including  a  third, 
quadratic  shape  function.  At  the  moment  the  definition  of  this  third  shape  function  and  a 
corresponding  degree  of  freedom  is  somewhat  arbitrary,  the  only  restriction  being  that  the 
set  of  shape  functions  must  form  a  dual  basis  to  the  set  of  degrees  of  freedom,  i.e., 

‘Pi(Xj)  =  &ij  *)j  =  0)  1)2 

where  <p,,  i  =  0,1,2  denote  the  degrees  of  freedom  and  Xj,  j  =  0,1,2  the  corresponding 
shape  functions.  Since  the  two  degrees  of  freedom  associated  with  the  linear  shape  functions 
are  function  values  at  the  endpoints,  this  implies  that  the  added  quadratic  shape  function 
must  vanish  at  both  endpoints.  The  remaining  hierarchical  functions  for  p  >  2  also  vanish 
at  the  vertices. 

An  r-Method 

If  the  mesh  size  h  and  the  polynomial  degree  p  are  fixed,  one  can  show  that  the  optimal 
mesh  is  that  for  which  the  nodes  are  positioned  so  that  the  error  is  equidistributed  over  the 
mesh;  i.e.,  the  error  over  each  element  is  the  same.  Diaz,  Kikuchi,  and  Taylor  [8]  have  used 
this  fact  to  produce  a  simple  algorithm  for  r- adaptivity: 

1.  For  fixed  h  and  p  on  a  mesh  of  quadrilateral  elements,  compute  error  estimators  9e  for 
all  elements  in  a  mesh  Qh  C-ff2. 

2.  For  each  4-element  group  calculate  the  error  per  unit  area  0e/Ae  of  each  element  in 
the  group. 
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3.  Compute  the  “centroid  of  error” 


A  dj 

E  ri  X  ~A~l3 

j- 1  a3  , 

Vk  = - r~ £ - .  k  =  8rouP  no. 

j=i  A) 

where  rj  is  the  position  vector  of  the  centroid  of  element  j  in  group  k. 

4.  Relocate  the  central  node  in  the  group  at  yh  so  as  to  (approximately)  equidistribute 
error  over  the  4-element  group. 

5.  Continue  this  process  over  ail  groups  until  the  error  is  equidistributed  over  the  entire 
mesh. 


This  simple  procedure  is  easy  to  implement  and  is  effective  in  many  classes  of  problems. 


A  General  h-p  Data  Structure 

Space  does  not  permit  the  discussion  of  a  general  h-p  data  structure  recently  developed  by 
the  author  and  his  colleagues  [7];  however,  some  h-p  results  will  be  mentioned  later  in  this 
paper. 


5  Some  Sample  Results 

We  next  cite  some  representative  results  obtained  recently  on  adaptive  finite  element  methods 
in  CFD.  Additional  details  are  given  in  earlier  papers  on  this  subject  (e.g.,  [9-30]). 

A.  h- Adaptive  Schemes  for  Unsteady  Compressible  Navier-Stokes  Codes  for  Rotor-Stator 
Interaction.  Our  ADAPT™ 2D  and  ADAPT™  3D  codes  were  originally  built  around  an  h- 
adaptive  data  structure  for  transient,  subsonic,  transonic,  and  supersonic  flows  in  turbines. 
These  flow  simulators  employ  an  algebraic  turbulence  model  and  a  sliding  mesh  technique  to 
model  the  motion  of  rotor  blades  relative  to  stator  blades  in  turbomachinery.  A  Euler  code 
for  these  problems,  which  was  reported  in  [14,  19],  was  a  predecessor  of  these  programs. 

Typical  results  of  a  two-dimensional  rotor-stator  calculation  are  shown  in  Figs.  1-4. 
There  one  sees  a  dynamically  changing  mesh  generated  after  each  of  a  specified  number  of 
time  steps  in  such  a  way  to  reduce  computed  errors  below  a  preset  error  tolerance.  Note  the 
continuity  of  density  (and  pressure)  contours  across  mesh  interfaces  and  the  interaction  of 
shocks  on  the  moving  turbine  blades.  The  code  also  computes  the  time  history  of  stresses 
in  the  blade  due  to  fluid  pressure  and  shear.  It  is  also  interesting  to  note  that  flows  during 
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Figure  1:  Dynamically  evolving  adaptive  grid  at  one  time  instant  for  Navier- Stokes  solution 
of  rotor-stator  interaction.  Mesh  for  rotor  blades  on  right  is  moving  relative  to  fixed  stator 
blade  mesh  along  sliding  interface. 
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Figure  2:  Computed  instantaneous  Mach  number  contours  at  3.5  cycles  for  rotor— stator  flow 
interaction  simulation. 
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Figure  3:  Computed  instantaneous  density  contours  at  3.5  cycles  of  rotor-stator  flow  inter¬ 
action  simulation. 
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Figure  4:  Computed  instantaneous  pressure  contours  at  3.5  cycles  of  viscous  rotor-stator 
flow  interaction. 
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multiple  cycles  of  the  blade  rows  have  been  computed  from  a  start-up  uniform  flow  to  flow 
with  periodic  structure  which  exists  after  the  turbine  is  in  operation.  Such  calculations 
simulate  as  many  as  21  complete  blade  revolutions  and  100,000  time  steps.  Note  also  that 
the  adaptive  process,  which  consumes  only  1  percent  of  the  total  calculation,  uses  around 
3,500  elements  to  deliver  the  desired  accuracy  at  any  particular  time  during  the  computation, 
while  a  uniform  mesh  needed  for  the  same  accuracy  consists  of  around  14,000  elements.  This 
version  of  ADAPTrM  2D  employs  bilinear  quadrilateral  elements;  the  3D  code  uses  trilinear 
brick  elements. 

New  results  on  three-dimensional  calculations  are  shown  in  Figs.  5-10.  A  three- 
dimensional,  three-level  adapted  mesh  around  a  pair  of  stationary  blades  is  shown  in  Fig. 
5.  Pressure  contours  on  planes  norma!  to  the  blade  are  shown  in  Fig.  6  with  the  adapted 
grid  on  the  leading  blade  shown  again  in  Fig.  7.  Results  for  moving  blades  are  in  Figs.  8, 
9,  and  10  with  the  computed  adaptive  grid  for  a  rotor  blade  moving  with  respect  to  a  fixed 
stator  shown  in  Figs.  9  and  10  after  1.5  and  6  cycles,  respectively. 

B.  Low-Mach  Number  Flow  Around  a  Cylinder.  Figures  11  and  12  show  the  versatility  of 
the  2D  h- adaptive  strategy  for  subsonic  flow  around  a  cylinder.  Note  the  dynamically  chang¬ 
ing  mesh  and  the  resolution  of  vortices  spinning  off  the  cylinder  at  M  =  0.65.  Fully  implicit 
schemes  which  function  on  unstructured  meshes  are  being  developed  for  these  problems,  but 
the  results  shown  were  obtained  with  an  explicit  flow  solver,  the  effectiveness  of  which  was 
made  possible  by  the  use  of  a  near-optimal  mesh  at  the  end  of  each  of  a  designated  collection 
of  time  steps. 

Three-dimensional  results  for  the  cylinder  are  shown  in  Figs.  13  and  14. 

C.  Supersonic  Flow  Over  a  Ramp.  Adapted  meshes  and  density  contours  for  flow  over  a 
three-dimensional  ramp  are  shown  in  Figs.  15  and  16. 

D.  An  h-r  Adaptive  Calculation  of  Shocks  Structure  on  a  Blunt  Body.  A  moving  node 
technique  (an  r-method)  is  used  to  condition  mesh  structures  prior  to  an  h-adaptive  calcula¬ 
tion.  Figures  17  and  18  show  a  typLai  calculation.  There  we  observe  an  h-r  adaptive  mesh 
and  density  contours  of  an  inviscid  gas  impinging  on  a  blunt  body.  Our  results  indicate 
that  r-method  preprocessing  can  be  beneficial  in  aligning  the  initial  mesh  with  shocks  in 
steady  supersonic  flow  problems  with  the  result  that  a  given  level  of  h-refinements  produces 
better  solutions  than  a  pure  h-process  which  is  initiated  on  an  unaligned  mesh. 

E.  A  New  h-p  Scheme  for  Optimal  Computations.  Both  two-  and  three-dimensional 
h-p  adaptive  codes  are  operational  for  the  analysis  of  general  linear  boundary-value  prob¬ 
lems.  Extensions  to  steady-state  Euler  equations  are  under  study.  These  codes  employ  an 
optimization  algorithm  which  chooses  the  optimal  distribution  of  h  (mesh  size)  and  p  (poly¬ 
nomial  degree/spectral  order)  to  produce  a  solution  with  a  given  ievel  of  local  accuracy  with 
a  minimum  number  of  unknowns. 
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contours. 


Figure  6:  Supersonic  flow  past  a  rotor  blade.  Pressure  contours  on  surfaces  through  blade 
cross  sections. 
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Figure  7:  Supersonic  flow  over  a  rigid  blade  in  motion,  adapted  grid  at  300  steps. 
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Figure  8:  Supersonic  flow  over  a  rigid  blade  in  motion,  (a)  Pressure  contours  after  300  time 
steps,  (b)  density  contours  after  300  time  steps. 
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(b) 


Figure  9:  Rotor-stator  interaction  simulation,  (a)  Density  contours  after  1  cycle,  (b)  density 
contours  after  ~  1.5  cycles. 
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Figure  10:  Rotor-stator  interaction  simulation,  density  contours  for  ~  6  cycles. 
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Figure  11:  Viscous  cylinder  problem  with  M  =  0.64,  flow  perturbed  after  2000  time  steps. 
Vortices  are  generated  and  shed;  (a)  instantaneous  grid  and  (b)  density  contours. 
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Figure  13:  Subsonic  flow  past  a  rigid  cylinder,  Mach  =  0.41.  (a)  Initial  grid,  (b)  density 
contours. 
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(a)  Pressure  contours,  (b)  Mach  contours. 


Figure  15:  Supersonic  flow  over  a  15°  ramp,  (a)  Density  contours  at  60  steps,  (b)  density 
contours  at  120  steps. 
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Figure  17:  An  h-r  adaptive  mesh  with  combined  node  relocation  and  mesh  refinement. 
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Figure  19:  An  instantaneous  adapted  h-p  mesh  with  overlayed  density  contours  for  viscous 
compressible  flow  over  a  deforming  elastic  plate;  high-order  spectral  elements  are  used  to 
model  the  viscous  boundary  layer  while  /i-adaptive  refinement  is  used  to  capture  the  shock. 
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Figure  19  contains  an  optimal  h-p  mesh  and  density  contours  over  a  flexible  elastic  plate 
deformed  by  the  action  of  a  viscous  incompressible  fluid.  Low-order  elements  are  used  to 
capture  the  shock  while  higher-order  elements  are  used  to  model  the  viscous  boundary  layer. 
The  flow  is  quasi-steady  and  a  fully  implicit  solver  with  a  multigrid  iterative  solver  is  used 
to  compute  successive  optimal  meshes  as  the  plate  deforms. 
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ABSTRACT .  We  model  the  viscoplastic  response  of  a  HY-100  steel  by  a  Power 
law,  and  flow  rules  proposed  by  Litonski,  Bodner  and  Partom,  and  Johnson  and 
Cook.  Each  of  these  flow  rules  is  first  calibrated  by  using  the  torsional 
test  data  at  a  strain-rate  of  3,300  sec'^-.  These  material  models  are  then 
used  to  study  the  thermomechanical  deformations  of  a  block  made  of  the  HY-100 
steel  and  undergoing  simple  shearing  deformations  at  a  nominal  strain-rate  of 
5000  sec'l.  A  material  defect  is  simulated  by  assuming  a  non-uniform  initial 
temperature  distribution  within  the  block.  Whereas  all  of  the  flow  rules  used 
predict  a  rapid  drop  of  the  shear  stress  as  a  shear  band  forms,  only  for  the 
Litonski  Law  for  nonpolar  materials,  does  an  unloading  elastic  wave  emanate 
outwards  from  the  shear  band. 

INTRODUCTION .  Noting  that  Batra  (1987)  has  briefly  reviewed  the  work  done 
on  shear  bands  till  1986,  we  discuss  below  some  of  the  work  done  since  then. 
For  strain- rate  hardening  but  thermally  softening  materials  Wright  and  Walter 
(1987)  found  that  the  shear  stress  within  a  band  collapses  rapidly  as  the  band 
grows.  Batra  and  Kim  (1989a)  accounted  also  for  material  elasticity  and  work 
hardening  effects  and  found  that  if  the  rate  of  collapse  of  the  shear  stress 
is  large,  then  an  unloading  elastic  wave  emanates  outwards  from  the  shear  band 
and  propagates  towards  the  boundaries  of  the  specimen.  The  development  of 
shear  bands  in  plane  strain  problems  have  been  studied,  among  others,  by  Anand 
et  al.  (1988),  Needleman  (1989),  LeMonds  and  Needleman  (1986a, 1986b) ,  Batra 
and  Liu  (1989a, 1989b) .  These  works  have  employed  different  flow  rules  and 
have  modeled  a  material  defect  by  introducing  either  a  temperature  perturba¬ 
tion  or  assuming  the  existence  of  a  weak  material  at  the  site  of  the  defect. 
Batra  and  Kim  (1989b)  have  recently  studied  the  development  of  a  shear  band  in 
a  block  of  HY-100  steel  undergoing  overall  simple  shearing  adiabatic  deforma¬ 
tions  and  compared  computed  results  with  the  experimental  observations  of  Mar- 
chand  and  Duffy  (1988).  They  found  that  the  dipolar  theory  due  to  Wright  and 
Batra  (1987)  and  Batra  (1987,  1989)  and  the  Bodner-Partom  (1975)  law  predict 
most  of  the  features  of  the  shear  band. 

We  note  that  Molinari  and  Clifton  (1987),  Tzavaras  (1987)  and  Wright 
(1989)  have  studied  the  problem  analytically.  For  rigid/perfectly  plastic 
materials,  Wright  (1989)  has  developed  a  criterion  that  ranks  materials 
according  to  their  tendency  to  form  adiabatic  shear  bands.  Hartley  et  al. 
(1987),  Giovanola  (1987),  and  Marchand  and  Duffy  (1988)  have  reported  the 
observed  histories  of  the  temperature  and  strain  within  a  band  as  it  develops. 

Here  we  presume  that  the  torsional  experiments  on  thin-walled  steel  tubes 
can  be  analyzed  by  studying  the  thermomechanical  deformations  of  a  viscoplas- 

*Supported  by  the  U.S.  Army  Research  Office  Contract  DAAL  03-88-K-0184  to  the 
University  of  Missouri-Rolla. 
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tic  block  undergoing  overall  adiabatic  simple  shearing  deformations.  We  find 
the  values  of  the  material  parameters  appearing  in  different  flow  rules  by 
solving  an  initial-boundary-value  problem  and  comparing  computed  results  with 
the  experimental  stress -strain  curve  at  a  nominal  strain-rate  of  3,300  sec'^. 
These  flow  rules  are  then  used  to  compute  the  initiation  and  growth  of  a  shear 
band  when  the  applied  nominal  strain-rate  is  5,000  sec'^.  It  is  found  that 
the  rate  of  stress  drop  during  the  growth  of  a  shear  band  as  predicted  by  the 
Bodner-Partom  law  and  the  dipolar  theory  due  to  Wright  and  Batra  (1987)  is 
similar  to  that  observed  experimentally. 

GOVERNING  EQUATIONS .  In  terms  of  non-dimensional  variables,  equations 
governing  the  thermomechanical  deformations  of  a  viscoplastic  block  undergoing 
overall  adiabatic  deformations  are  (e.g.  see  Batra  and  Kim  (1989a)) 


£v  -  (s  -  2a,y),y  0<  y  <  1,  (2.1) 
9  -  k0iyy+S7p+  iff dp,  0  <  y  <  1,  (2.2) 
s  -  m(v, y  -  7p) .  (2.3) 
ff  -  Mi(v,yy  -  dp) ,  (2.4) 
7p  “  g(s,ff,7p.dp,0 ,i) ,  (2.5) 
dp  -  ih(s,ff,7p,dp,0,i) .  (2.6) 


These  equations,  written  for  dipolar  materials,  reduce  to  those  for  non¬ 
polar  materials  when  i  is  set  equal  to  zero.  Here  p  is  the  mass  density,  v 
the  velocity  of  a  material  particle  in  the  direction  of  shearing,  a  superim¬ 
posed  dot  indicates  the  material  time  derivative,  s  is  the  shearing  stress,  2 
a  material  characteristic  length,  a  the  dipolar  stress,  and  a  comma  followed 
by  y  signifies  partial  differentiation  with  respect  to  y.  Furthermore,  k  is 
the  thermal  conductivity,  7p  the  plastic  strain- rate,  dp  the  dipolar  plastic 
strain-rate,  m  the  shear  modulus,  and  9  is  the  temperature  change  from  that  in 
the  reference  configuration.  Equation  (2.1)  expresses  the  balance  of  linear 
momentum  and  (2.2)  the  balance  of  internal  energy,  equations  (2.3)-(2.6)  are 
constitutive  relations.  The  different  viscoplastic  flow  rules  differ  in  the 
functional  forms  of  g  and  h  and  are  given  below  in  the  next  section. 

For  the  initial  conditions  we  take 

2 

v(y , 0)  -  0 , s(y , 0)  -  0,a(y,0)  -  0,  ff( y,0)  -  «(l-y2)9  e'5  y  .  (2.7) 

That  is,  in  the  initial  rest  state  of  the  block,  it  is  taken  to  be  stress 
free.  The  initial  temperature  distribution  simulates  the  defect  or  inhomoge¬ 
neity  in  the  block  assumed  to  be  present  near  the  point  y  -  0  and  the  value  of 
e  represents  the  strength  of  the  defect. 

We  presume  that  the  overall  deformations  of  the  block  are  adiabatic  and 
the  lower  surface  is  at  rest  while  the  upper  surface  is  assigned  a  velocity 
that  increases  linearly  from  0  to  1  in  time  tr  and  then  stays  at  the  constant 
value  of  1.0.  Thus, 

*,y(0,t)  -  0,  fl,y(l,t)  -  0,  v(0,t)  -  0,  (2.8) 

v ( 1 , t )  -  t/tr,  0  <  t  <  tr,  (2.9) 
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1,  t  >  tr, 


and  for  dipolar  materials,  we  also  assume  that 

0,t)  -  0 , a(l , t)  -  0. 


(2.10) 


Computations  for  the  domain  -1  <  y  <  1  and  with  boundary  conditions  a(-l,t)  = 
0,a(l,t)  -  0  have  given  a(0,t)  -  0. 

3.  VISCOPLASTIC  FLOW  RULES .  In  order  to  calibrate  the  various  flow  rules 
against  the  shear  stress -shear  strain  curve  given  by  Marchand  and  Duffy  (1988) 
for  a  strain-rate  of  3,300  sec'^-,  we  solved  numerically,  the  initial¬ 
boundary -value  problem  outlined  above  with 

s(y,o)  -  1.0,7p(y,o)  -  0.012,  v(y,o)  -  y,0(y,o)  -  0°  c,  £  -  0, 

tr  -  0.033,  p  -  7,860  kg/m^,c  -  473  J/kg°c,  k  -  49.73  w/m^  °c ,  H  -  2.5  mm, 

70  -  3,300  sec’l. 

Here  H  is  the  height  of  the  block  and  70  is  the  average  applied  strain-rate. 
With  no  initial  temperature  perturbation,  the  block  deforms  uniformly  and 
homogeneously  and  the  dipolar  effects  vanish  identically.  As  far  as  possible 
we  kept  the  values  of  the  strain-hardening  exponent  and  the  strain-rate¬ 
hardening  exponent  equal  to  those  given  by  Marchand  and  Duffy  (1988) ,  and 
adjusted  the  values  of  other  parameters  till  the  computed  stress  -  strain 
curve  came  out  close  to  that  given  by  Marchand  and  Duffy. 

3.1  Litonski's  Law  for  Nonpolar  and  Dipolar  Materials.  Wright  and  Batra 
(1987)  generalized  the  constitutive  relation  proposed  by  Litonski  (1977)  to  be 
applicable  to  nonpolar  and  dipolar  materials.  Batra  and  his  co-workers 
(1987,  1988,  1989)  have  used  it  to  study  the  initiation  and  growth  of  shear 
bands.  It  may  be  written  as: 


As,  CL  -  -  a, 
l 


(3.1) 


max  0 , 


( 1 -a# ) ( 1+  — )n 
<?o 


se  -  (,2  +  ,2)1/2, 


1/m 

V" 


(3.2) 


(3.3) 


2  V 

<p  -  Ase/(1  +  — )n  . 


(3.4) 
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Here  cp  can  be  viewed  as  an  internal  variable  that  describes  the  work  hardening 
of  the  material.  Its  evolution  is  given  by  equation  (3.4).  In  equation 
(3.2),  (l-a0)  describes  the  softening  of  the  material  due  to  its  heating,  b 
and  m  characterize  its  strain-rate  hardening,  and  <pQ  and  n  its  work  hardening. 
The  following  values  of  material  parameters  resulted  in  a  stress  -  strain  curve 
that  was  close  to  the  one  observed  experimentally. 

a  -  0 . 00185/°c ,  cpQ  -  0.012,  n  -  0.107,  m  -  0.0117,  b  =  104  sec,  1  -  0.005 

3.2  Power  Law.  For  nonpolar  materials  and  assuming  that  there  is  no 
loading  surface,  this  flow  rule  for  the  HY-100  steel  can  be  written  as 


7p  ”  (IQ'4) 


s85  •  4-7 


r  > 

9.145 

1 

6 

.  0-012  . 

300  _ 

-64.103 


(3.5) 


Here  6  is  the  current  temperature  in  degrees  Kelvin  and  7  is  the  total  strain 
at  a  material  particle. 

3.3  Bodner-Partom  Law.  For  t’  HY-100  steel,  the  constitutive  relation 
proposed  by  Bodner  and  Partom  (197..,  can  be  written  as 


7n  -  108  exp 

r 

Here  9  is  the  absolute  temperature  of  a  material  particle  and  i7p  is  the 

plastic  work  done. 

3.4.  Johnson-Cook  Law.  The  constitutive  relation  proposed  by  John¬ 
son  and  Cook  (1983)  takes  the  following  form  for  the  HY-100  steel. 


*  1 

17 

1200 


n  - 


K  -  1600  -  300  exp  f-5  Wn) 


H.6) 


7P 


exp 


' 

r  s  i 

' 

- 

- - 1.0 

•/0.0277 

. 

(0.45  +  1.433  7p0-107)  (1-T° ■ 7) 

. 

T  -  (0  -  tfo)/1200. 


Here  9Q  equals  the  ambient  temperature. 


(3.7) 


4.  DETERMINATION  OF  THE  SIZE  OF  THE  PERTURBATION.  Here  we  model  the 
cumulative  effect  of  the  change  in  the  thickness  of  the  specimen  and  possibly 
the  slight  variation  in  the  material  properties  by  assuming  a  nonuniform  ini¬ 
tial  temperature  distribution  as  given  by  Eqn.  (2.7).  For  different  flow 
laws,  the  value  of  e  was  determined  so  as  to  initiate  a  shear  band,  as  sig¬ 
nified  by  a  rapid  drop  in  the  shear  stress,  at  a  value  of  the  average  strain 
close  to  that  found  experimentally.  The  initial-boundary- value  problem  out¬ 
lined  in  Section  2  with  tr  -  0.033  was  solved  by  the  finite  element  method. 
Values  of  e  equal  to  1°  c ,  2°  c,  5°  c  and  9°  c  for  the  Litonski  Law  for 
nonpolar  and  dipolar  materials,  Power  Law,  and  the  Bodner-Partom  Law  and  the 
Johnson-Cook  Law,  respectively,  result  in  stress-strain  curves  shown  in  Fig. 

1 . 


186 


Fig.  1.  Shear  stress -shear  strain  curves  computed  with  different  flow 
rules  and  with  different  initial  temperature  perturbations. 


_ experimental,  _ .. _ .  . Bodner-Partom . Litonski  (non¬ 
polar),  _  Litonski  (dipolar)  _  Power, 

Johnson- Cook. 


These  curves  vividly  reveal  that  until  the  time  the  shear  stress  begins  to 
drop  rapidly,  all  of  the  flow  rules  considered  predict  material  behavior  in 
reasonable  agreement  with  the  experimental  observations.  For  nonpolar  mate¬ 
rials  Litonski' s  Law,  the  Power  Law  and  the  Johnson-Cook  Law  give  essentially 
a  catastrophic  drop  in  the  shear  stress  with  virtually  no  increase  in  the 
nominal  shear  strain.  This  does  not  agree  with  the  experimental  data  since 
Marchand  and  Duffy  observed  that  during  the  drop  of  the  shear  stress,  the 
nominal  strain  increases  by  approximately  5  percent.  The  Litonski  Law  for 
dipolar  materials  and  the  Bodner-Partom  Law  for  nonpolar  materials  do  predict 
the  gradual  drop  in  the  shear  stress  in  agreement  with  the  experimental  data. 
However,  for  the  Bodner-Partom  Law  the  shear  stress  does  not  drop  as  much  as 
it  does  during  the  tests  and  it  reaches  a  plateau. 
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5.  RESULTS  FOR  A  NOMINAL  STRAIN-RATE  OF  5.000  SEC.-1.  With  the  values  of 
material  parameters  and  the  size  of  the  temperature  perturbation  found  above 
kept  fixed,  we  increased  the  prescribed  velocity  on  the  upper  boundary  so  as 
to  deform  the  block  at  a  nominal  strain- rate  of  5,000  see'  .  Note  that  the 
values  of  some  of  the  non-dimensional  variables  appearing  in  the  governing 
equations  will  change.  For  each  one  of  the  flow  rules  used,  the  shear  stress 
attained  a  maximum  value  when  the  average  shear  strain  was  approximately  equal 
to  0.30.  For  subsequent  deformations,  we  have  plotted  in  Figs.  2  and  3  the 
evolution  of  the  shear  stress  and  the  particle  velocity  within  the  specimen. 
The  value  of  the  nominal  shear  strain  at  which  the  shear  stress  drops  and  the 
shear  band  initiates  depends  upon  the  flow  rule  used.  However,  in  each  case, 
the  value  of  the  nominal  shear  strain  when  a  band  initiates  is  noticeably  more 
than  the  value  at  which  the  shear  stress  attains  a  maximum  value. 

For  nonpolar  materials,  the  rate  of  drop  of  the  shear  stress  is  highest 
for  the  Litonski  law  as  compared  to  that  for  the  other  three  flow  rules  used. 
For  the  Bodner-Partom  law,  the  shear  stress  drops  initially,  but  then  seems  to 
reach  a  plateau.  For  the  Power  law,  the  shear  stress  oscillates  both  in  space 
and  time  and  there  was  no  unloading  wave  observed.  With  the  Johnson-Cook  law, 
the  shear  stress  drops  almost  as  rapidly  as  with  the  Litonski  law,  but  seems 
to  stay  uniform  throughout  the  specimen.  For  the  Litonski  law,  as  the  shear 
stress  drops,  an  unloading  elastic  wave  emanates  out  of  the  shear  band  and 
travels  towards  the  other  end  of  the  specimen.  Batra  and  Kim  (1989a)  found 
this  unloading  wave  and  their  computed  wave  speed  was  very  close  to  the  ana¬ 
lytical  value  of  The  propagation  of  the  wave  is  more  clear  from  the 

particle  velocity  plot  depicted  in  Fig.  3.  We  note  that  we  assumed  the 
existence  of  a  yield  surface  only  for  the  Litonski  law.  For  ocher  flow  ruies . 
plastic  deformations  are  assumed  to  occur  at  all  times. 

For  nonpolar  materials,  only  Litonski' s  law  as  generalized  by  Wright  and 
Batra  was  used.  In  this  case,  even  though  the  shear  stress  drop  was  larger 
near  the  center  as  compared  to  that  for  nonpolar  materials ,  no  wave  phenomenon 
was  noticed.  This  becomes  transparent  from  the  velocity  plot  in  Fig.  3. 

For  nonpolar  materials,  the  velocity  plots  indicate  that  the  particle 
velocity  increases  rapidly  from  zero  at  y  -  0  to  as  high  as  2  at  a  point  close 
to  y  -  0  and  then  decreases  to  the  prescribed  value  of  1  at  y  —  1.0.  The 
overshoot  in  the  particle  velocity  is  highest  for  the  Litonski  law.  The  flow 
rule  used  affects  the  evolution  of  the  particle  velocity  significantly.  With 
the  Johnson-Cook  law,  no  oscillations  in  the  particle  velocity  are  observed. 
With  the  Bodner-Partom  flow  rule,  no  spatial  oscillations  in  the  particle 
velocity  are  seen  but  after  a  shear  band  has  initiated,  the  velocity  of  a 
material  particle  oscillates  in  time.  The  spatial  and  temporal  variation  in 
the  particle  velocity  with  the  Power  law  is  noticeably  different  from  that 
computed  with  the  other  three  flow  rules.  A  glance  at  the  velocity  and  the 
shear  stress  plot  seems  to  indicate  that  there  is  no  unloading  wave  emanating 
out  of  the  shear  band  in  this  case. 

6.  CONCLUSIONS .  For  overall  adiabatic  simple  shearing  thermoraechanical 
deformations  of  a  viscoplastic  block,  we  first  calibrated  the  four  different 
flow  rules  so  as  to  give  essentially  identical  shear  stress-shear  strain 
curves  at  a  nominal  strain-rate  of  3,300  sec'^.  Then,  the  size  of  the  initial 
temperature  perturbation  was  adjusted  to  yield  the  initiation  of  the  shear 
band,  as  indicated  by  a  significant  drop  in  the  shear  stress  for  very  little 
change  in  the  nominal  shear  strain,  at  almost  the  same  value  of  the  nominal 
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The  Evolution  of  the  Shear  Stress  Within  the  Specimen  After 
the  Shear  Stress  has  Attained  Its  Peak  Value. 

(a)  Litonski  Law  (b)  Litonski's  Flow  Rule  for  Dipolar  Materials 

(c)  Power  Law  (d)  Bodner-Partom  Law,  (e)  Johnson-Cook  Law 


The  Evolution  of  the  Velocity  Field  Within  the  Specimen  After 
the  Shear  Stress  has  Attained  Its  Peak  Value. 


(a)  Litonski's  Law, 
(c)  Power  Law 


(b)  Litonski's  Flow  Rule  for  Dipolar  Materia's 
(d)  Bodner-Partom  Law,  (e)  Johnson-Cook  Law 


strain.  These  flow  rules  when  used  to  compute  the  initiation  and  growth  of 
shear  bands  at  a  nominal  strain-rate  of  5,000  sec'^-  gave  noticeably  different 
values  of  the  nominal  strain  at  which  a  shear  band  initiates.  Also,  the  rate 
of  drop  of  the  shear  stress  as  predicted  by  the  Bodner-Partom  law  and  the 
dipolar  theory  of  Wright  and  Batra  was  closer  to  that  observed  experimentally. 
For  nonpolar  materials,  the  Litonski  law  predicts  the  emanation  of  an  unload¬ 
ing  elastic  wave  out  of  the  shear  band  as  it  grows.  The  other  three  flow 
rules  do  give  the  overshoot  in  the  particle  velocity  at  the  edges  of  the  band 
as  also  given  by  the  Litonski  law,  but  do  not  predict  the  propagation  of  the 
unloading  elastic  wave.  This  could  possibly  be  due  to  the  use  of  a  yield 
criterion  for  the  Litonski  law  and  not  using  any  such  criterion  for  the  other 
flow  rules . 
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REFERENCES 

Anand,  L. ,  Lush,  A.  M. ,  and  Kim,  K.  H. ,  1988,  "Thermal  Aspects  of  Shear 

Localization  in  Viscoplastic  Solids",  Thermal  Aspects  in  Manufacturing,  M. 
H.  Attia  and  L.  Kops,  eds . ,  ASME-PED-Vol .  30,  pp.  89-103. 

Batra,  R.  C. ,  1987,  "The  Initiation  and  Growth  of  and  the  Interaction  Among, 
Adiabatic  Shear  Bands  in  Simple  and  Dipolar  Materials”,  Int.  J.  Plastic¬ 
ity,  Vol.  3,  pp.  75-89. 

Batra,  R.  C.  and  Kim,  C.H.,  1989a,  "Adiabatic  Shear  Banding  in  Elastic- 
Viscoplastic  Nonpolar  and  Dipolar  Materials”,  Int.  J.  Plascicitv,  (in 
press)  . 

3atra,  R.  C.  and  Kim,  C.  H. ,  1989b,  "Effect  of  Viscoplastic  Flow  Rules  on  the 
Initiation  and  Growth  of  Shear  Bands  at  High  Strain  Rates",  (pending  pub¬ 
lication)  . 

Batra,  R.  C.  and  Liu,  De-Shin,  1989a,  "Adiabatic  Shear  Banding  in  Plane  Strain 
Problems",  J.  Appl.  Mechs . ,  Vol.  56,  (in  press). 

Batra,  R.  C  and  Liu,  De-Shin,  1989b,  "Adiabatic  Shear  Banding  in  Dynamic 

Plane  Strain  Compression  of  a  Viscoplastic  Material",  Int.  J.  Plasticity 
( in  press) . 

Batra,  R.  C.,  1989,  "Effect  of  Nominal  Strain  Rates  on  Adiabatic  Shear  Banding 
in  Dipolar  Materials",  Proc.  1st  Pan  American  Congress  of  Applied  Mechs., 
Rio  de  Janeiro,  Brazil,  pp .  79-82. 

Bodner,  S.  R.  and  Partom,  Y. ,  1975,  "Constitutive  Equations  for  Elastic- 
Viscoplastic  Strain-Hardening  Materials",  J.  Appl.  Mechs.,  Vol.  42,  pp. 
385-389. 

Giovanola,  J.,  1987,  Proc.  Impact  Loading  and  Dynamic  Behavior  of  Materials, 
Bremen,  W.  Germany. 

Hartley,  K.  A.,  Duffy,  J.  and  Hawley,  R.  H.,  1987,  "Measurement  of  the  Tem¬ 
perature  Profile  During  Shear  Band  Formation  in  Steels  Deforming  at  High 
Strain  Rates",  J.  Mechs.  Phys .  Solids,  Vol.  35,  pp.  283-. 


191 


Johnson,  G.  R.  and  Cook,  W.  H. ,  1983,  "A  Constitutive  Model  and  Data  for  Met¬ 
als  Subjected  to  Large  Strains,  High  Strain  Rates  and  High  Temperatures", 
Proc .  7th  Int.  Symp.  Ballistics,  The  Hague,  The  Netherlands,  pp.  1-7. 

Kwon,  Y.  W.  and  Batra,  R.  C.,  1988,  "Effect  of  Mulf!’'1e  Initial  Imperfections 
on  the  Initiation  and  Growth  of  Adiabatic  Sheai  nds  in  Nonpolar  and 
Dipolar  Materials",  Int.  J.  Engng.  Sci.,  26,  pp.  1177-1187. 

LeMonds ,  J.  and  Needleman,  A.,  1986a,  "Finite  Element  Analysis  of  Shear 

Localization  in  Rate  and  Temperature  Dependent  Solids",  Mechs .  Materials, 
Vol.  5,  pp.  339-. 

LeMonds,  J.  and  Needleman,  A.,  1986b,  "An  Analysis  of  Shear  Band  Development 
Incorporating  Heat  Conduction",  Mechs.  Materials,  Vol.  5,  pp.  363-. 

Litonski  J.,  1977,  "Plastic  Flow  of  a  Tube  Under  Adiabatic  Torsion",  Bulletin 
de  i'Adademie  Pololnaise  des  Sciences,  Sciences  Tech.,  Vol.  25,  pp.  7-14. 

Marchand,  A.  and  Duffy,  J.,  1988,  "An  Experimental  Study  of  the  Formation  of 
Adiabatic  Shear  Bands  in  a  Structural  Steel",  J.  Mech.  Phys .  Solids,  Vol. 
36,  pp.  251- . 

Molinari,  A.  and  Clifton,  R.  J.,  1987,  "Analytical  Characterization  of  Shear 
Localization  in  Thermoviscoplastic  Materials",  J.  Appl.  Mech.,  Vol.  54, 

pp.  806-812. 

Needleman,  A.,  1989,  "Dynamic  Shear  Band  Development  in  Plane  Strain",  J. 

Appl.  Mechs.,  Vol.  56,  pp .  1-8. 

Tzavaras ,  A.  E. ,  1987,  "Effect  of  Thermal  Softening  in  Shearing  of  Strain-Rate 
Dependent  Materials",  Arch.  Rat.  Mech.  Anal.,  Vol.  99,  pp.  349-374. 

Wright,  T.  W. ,  and  Batra,  R.  C.,  1987,  "Adiabatic  Shear  Bands  in  Simple  and 
Dipolar  Plastic  Materials",  Proc.  Macro-  and  Micro-Mechanics  of  High 
Velocity  Deformation  and  Fracture,  IUTAM  Symp.,  Tokyo,  Aug.  1985,  pp . 
189-201. 

Wright,  T.  W. ,  1989,  "Approximate  Analysis  for  the  Formation  of  Adiabatic 
Shear  Bands",  J.  Mechs.  Phys.  Solids  (in  press). 


192 


A  PROPERTY  OF 

LINEAR  FEEDBACK  SHIFT  REGISTER  SEQUENCES 


Harold  Fredricksen 
Mathematics  Department 
Naval  Postgraduate  School 
Monterey,  CA  93940 


and 


Gary  Krahn 

Mathematics  Department 
United  States  Military  Academy 
West  Point,  NY  10996 


I.  INTRODUCTION 

Modern  high-speed  communication  demands  high-speed  techniques  to  generate  random¬ 
like  sequences.  One  of  the  simplest  and  most  efficient  devices  for  generating  deterministic, 
random  looking  binary  sequences  is  the  shift  register.  In  a  sense,  no  sequence  which  depends 
on  a  few  parameters,  such  as  the  feedback  connections  of  a  linear  feedback  shift  register,  can 
be  considered  truly  random.  Solomon  W.  Golomb  introduced  the  name  “pseudo-random”  for 
periodic  binary  maximum  length  linear  sequences  (m-sequences)  because  they  satisfied  the 
three  randomness  properties  of  balance,  runs,  and  correlation.  These  “pseudo-random” 
sequences  have  a  tremendous  amount  of  combinatorial  and  algebraic  structure.  Their 
suitability  derives  chiefly  from  their  efficiency  of  generation  and,  more  importantly,  their 
randomness  properties.  The  applications  for  linear  feedback  shift  register  sequences  of 
maximum  length  (LFSRS)  include  secure  data  transmission,  multiple  address  coding,  error 
correcting  codes,  radar  range  measuring  and  random  number  generation.  Their  ideal 
correlation  property  and  other  randomness  properties  can  be  derived  from  the  shift-and-add 
(SAA)  property.  The  ultimate  goal  of  completely  understanding  the  theoretical  behavior  of 
LFSR  has  not  yet  been  achieved.  A  better  understanding  of  LFSRS  would  provide 
improvements  for  the  design  and  analysis  of  communication  systems. 

In  this  note  we  analyze  the  SAA  property  of  the  LFSRS.  An  algebraic  approach  is  used 
in  the  analysis  of  the  “unknown  shift”  of  the  sum  of  two  shifted  versions  of  the  LFSRS. 


n.  SHIFT  -  AND  -  ADD  PROPERTY 


A  LFSRS  of  span  n  is  a  sequence  of  length  N  =  2n-  1  containing  all  non-zero  binary 
strings  of  length  n.  The  LFSRS  is  specifically  generated  by  an  n  stage  shift  register  whose 
feedback  polynomial  is  a  primitive  polynomial  f(x)  of  degree  n  over  the  Galois  field  of  two 
elements,  GF(2).  The  computations  in  this  note  are  performed  modulo  2Q-  1  where  n  is  the 
degree  of  f(x),  and  modulo  2  and  modulo  the  generating  polynomial  f(x). 

The  SAA  property  states  that  if  two  shifted  versions  of  the  same  LFSRS  are  added 
termwise  modulo  2,  the  resulting  sequence  is  also  a  shifted  version  of  the  same  sequence. 

DEFINITION:  Let  n,  i  and  j  be  positive  integers  with  both  i  and  j  <  2n  -  2.  The 

ordered  pair  (i  ,  j)  is  a  shift-and-add  pair  if  and  only  if  the  modulo  2  sum  sequence: 

{sk}  -f  {sk  +  *}  +  {sk  J}  =  {  0  }  ,  the  zero  sequence. 

Here,  {s^4}  is  the  cyclic  shift  by  t  of  the  original  sequence  {sk}. 

The  existence  of  SAA  pairs  for  m-sequences  follows  from  the  shift-and-add  property 
of  m-sequences.  In  fact,  for  every  given  shift  i  of  the  sequence  S,  the  shift  j  of  the  sequence 
S  is  a  unique  function  of  i.  That  is  j  =  y(i).  The  function  y(i)  to  compute  j  from  i  exists 
but  is  difficult  to  determine.  Four  properties  of  7  are  as  follows: 

1.  7(1)  is  a  unique  function  of  i, 

Proof:  This  follows  directly  from  the  definition  of  SAA  pairs.  Otherwise, 
{sk}  would  not  be  of  maximal  period  and  thus  would  be  generated  by  a 
polynomial  of  a  smaller  degree  than  n. 

2.  7(7(i))  =  i, 

Proof:  x*  =  1  +  x7^=  follows  from  the  definition  of  7, 

i.e.,  7  is  idempotent. 

3.  7(-i)  =  7(i)  -  i- 

Proof:  1  +  xl  =  (1  +  x5)  x1  =  xT(i)xi  =  x7(i)  '  \ 
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NOTE:  For  small  values  of  n,  the  SAA  pairs  (i,  -y(i))  for  the  respective  polynomial  f(x)  are 
easily  determined  by  analysis  of  the  multiplicative  group  of  the  corresponding  finite  field. 
Alternate  procedures  to  determine  the  SAA  purs,  relative  to  a  respective  generating 
polynomial  f(x),  also  include  long  division  of  (1  +  x1)  by  f(x)  to  determine  the  remainder 


Thus  we  see  that,  either  by  our  original  definition  of  SAA  pairs  or  by  the  procedure  just 
described,  the  sequence  S  generated  by  a  minimal  polynomial  generator  f(x)  satisfies  the 
relation 


S  +  S1  +  S  '  =  {0}  the  zero  sequence  or, 

f(x)  |  (1  +  x*  +  X7(l))  . 

The  symbol  “  f  |  g  ”  denotes  that  g  is  a  polynomial  multiple  of  f  with  coefficients  in 
GF  (2). 


4.  If  7(i)  =  j  ,  then  7(2i)  =  2j.  Thus,  (2i,  2j)  is  also  a  SAA  pair  for  f(x). 
Proof:  Suppose  f(x)  |  (1  -j-  x*+  x7^)  ,  then 
(l  +  *4  x7<i))2  =  1  +  x24  x2l'<i)  in  GF  2. 


Hence,  if  i  and  j  =  7(i)  belong  to  a  pair  of  cyclotoraic  cosets  of  size  n,  then  (n-1) 
additional  SAA  pairs  can  be  computed  easily  through  squaring  the  polynomial  relation. 

Multiplication  of  the  relation  (1  +  x1  +  x^)  by  x'1  and  by  x  J  will  generate  the  SAA 
pairs  (-i,  j-i)  and  (-j,  i-j),  respectively.  Therefore,  from  one  SAA  pair  it  is  possible  to 
generate  a  total  of  (3n)  pairs  by  only  0(n)  work. 


HI.  GENERATING  ADDITIONAL  SAA  PAIRS: 

In  the  previous  section  we  show  how  to  find  3n  SAA  pairs  when  one  SAA  pair  is  known. 
Here  we  show  how  to  find  additional  SAA  pairs. 
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EXAMPLE:  Let  f(x)  =  1  +  x  +  x7.  Thus,  (1,7)  is  a  SAA  pair  and  we  determine  that: 

SET  R  -  Under  squaring 

(1,7)  (2,14)  (4,28)  (8,56)  (16,112)  (32,97)  (64,67) 

are  SAA  pairs  by  squaring.  Multiplying  f(x)  by  x"1  we  get: 

SET  S  -  Under  multiplication  by  x~l  and  squaring 

(24,123)  (48,119)  (96,111)  (65,95)  (3,63). 

x  J  we  get: 

SET  T  -  Under  multiplication  by  xJ  and  squaring 
(120,121)  (113,115)  (99,103)  (71,79)  (15,31)  (30,62)  (60,124). 

The  operations  above  are  sufficient  to  determine  only  21  of  the  63  SAA  pairs  for  the  generator 
f(x)  =  l  +  x  +  x'.  The  SAA  mate  for  i  =  5  has  not  yet  been  determined.  By  the  following 
algebraic  manipulations  we  can  determine  another  SAA  pair: 


(6,  126)  (12,125) 

Multiplying  f(x)  by 


1  +  x5 

— 

(1  +  X5)x 

— 

x*1 

X  +  X6 

— 

x*1 

x+(l+  x126) 

= 

x5 1-1 

(x+l)+  X126 

= 

x*1 

(x7)+  x*2® 

x*1 

x7(l+  X119) 

= 

x541 

x7(x48) 

x5+1 

X55 

x*1 

x54 

xj 

Hence,  (5,54)  is  a  SAA  pair.  With  this  SAA  pair  we  determine  an  additional  21  pairs  by 
following  the  previous  methods. 
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SET  R 


(5,54) 

(10,108) 

(20,89) 

(40,51) 

(80,102) 

(33,77) 

(66,27) 


SET  S 

SET  T 

(49,122) 

(73,78) 

(98,117) 

(19,29) 

(69,107) 

(38,58) 

(11,87) 

(76,116) 

(22,47) 

(25,105) 

(44,94) 

(50,83) 

(88,61) 

(100,39) 

Finally,  we  find  the  last  “family”  of  SAA  pairs.  The  SAA  mate  for  i  =  9  can  be  found  in  the 
same  manner  as  i  =  5  as  follows: 


1  +  x9 

= 

X1 

(1  +  x9)x7 

= 

X7+j 

x7  +  x16 

x7+j 

x7+(l+  x112) 

= 

X7+j 

(x7+l)+  x112 

= 

x7+j 

(x)+  x112 

x7+j 

x(l+  X111) 

= 

X7* 

x(x96) 

= 

X7+j 

x97 

— 

x7+j 

x90 

= 

X5 

SET  R 
(9,90) 
(18,53) 
(36,106) 

(72.85) 
(17,43) 

(34.86) 
(68,45) 


SET  S 
(81,118) 

(35.109) 
(70,91) 
(13,55) 

(26.110) 
(52,93) 
(104,59) 


SET  T 

(37,46) 

(74,92) 

(21,57) 

(42,114) 

(84,101) 

(41,75) 

(82,23) 


Knowing  only  the  initial  SAA  pair  (1,7),  all  63  SAA  pairs  are  derived.  The  above  procedure 
has  thus  been  used  to  compute  all  the  SAA  pairs  for  f(x)  in  a  relatively  efficient  manner. 
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IV.  STRUCTURAL  PROPERTIES  OF  SAA  PAIRS: 


Here  we  generalize  the  previous  example  and  state  and  prove  propositions  that  can  be  used 
to  efficiently  solve  for  the  unknown  families  of  SAA  pairs. 

Proposition  I:  Let  j  and  -y(j)  be  a  SAA  pair.  Then 

T(l'0)+j)  =  7(70)  -  2j)  +  2j. 

Proof.  1  +  x7ti)+j= 


from  which  the  result  follows  easily. 

EXAMPLE:  For  n  =  7  assume  that  the  SAA  pair  (1,7)  and  the  family  of  21  SAA 

pairs  associated  with  it  are  known.  Let  j=l  and  7(j  )  =  7.  Substitution  into  Eq  (1)  yields 
7(7  +  1)  =  7(7  -  2)  +  2.  Then,  7(8)  =  7(5)  +  2  and  7(5)  =  56  -  2  =  54. 

It  is  only  necessary  to  check  one  SAA  pair  from  a  given  family  to  determine  if  a 
previously  unknown  family  can  be  generated  using  proposition  I.  This  is  shown  as  follows: 

Let  (ii  ,  jj)  be  a  SAA  pair  and  let  I  =  {(it,  j*)  |5?=i}  be  the  set  of  SAA  pairs  formed  under 
squaring.  The  set  L  =  {(i*  +  jt)  |*=1}  forms  a  cyclotomic  coset.  Similarly, 

M  =  {(ifc  -  2jt)  |t=1}  and  Q  =  {(jfc  -  2it)  |jj=1}  form  cyclotomic  cosets.  As  previously 
shown,  if  (ij  ,  ji)  is  a  SAA  pair  then  (-ij  ,  jr  ix)  and  (-j!  ,  irjj)  are  SAA  pairs.  It 
follows  that 


1  +  x7(j)*i, 

1  +  (1  +  x^)  x^, 


2j 


1  +  j  +  X 
x7(j)+  X2j 


(X7(j)'2j+  1)  x*. 


j)-2j) 


2j 


7(70  )-2j)  +  2j 
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{(  i*  +  Ur  h)  l*=J  =  Q,  {Or  it)  -  2(-ik)  |J=l}  =  L  ,  {(-ik  -  2(jt-  it)  |2=1}  =  M, 
{■ik  +  (ifJt)  12=1}  =  M,  {(ik-  jk)  -2(-jt)  |?=1}  =  L  ,  and  {(-jt  -  2(ik-jk)  |j?=i}  =  Q. 


Proposition  II:  Let  (i,7(i))  and  (j,7(j))  be  SAA  pairs.  Then 
7(70)  +  i)  =  7(j  +  i  -7(i))  +  7(0- 

Proof:  1  +  x7(j)  +  1  =  1  +  (*i  +  1)  x1, 

=  1  +  j+i+  x\ 

=  X7(i)  +  xi+i 

=  (^+i*7(i)  +  1)  x7(i), 

=  xT(j+i  -  T®)  xT(i) 

from  which  the  results  follows  easily. 

EXAMPLE:  For  n  =  7  let  the  SAA  pair  (1,7)  and  the  family  of  21  SAA  pairs  associated 
with  it  be  given.  Hence,  (1,7)  and  (4,  28)  are  known  SAA  pairs.  Let  i=l  and  j  =  4. 
Substitution  into  Eq  (2)  yields  7(28+1)  =  7(5  -  7)  +  7.  Then,  7(29)  =  7(125)  +7  and 
7(29)  =  12  +  7  =  19. 

A  family  and  the  associated  SAA  pairs  are  defined  to  be  “generators”  if  the  SAA  pairs  within 
this  family  generate  at  least  one  SAA  pair  from  another  family  using  proposition  II. 

NOTE:  The  application  of  Propositon  II  requires  the  input  of  two  ordered  SAA  pairs  where 
the  order  of  the  input  is  important  and  repetition  of  a  SAA  pair  is  allowed.  If  (i,  j)  is  a  SAA 
pair,  then  the  four  inputs  of  (i,  j),  O',  i);  (i,  j),  (i,  j);  (j,  i),  (j,  i);  and  (j,  i),  (i,  j)  into 
proposition  II  provide  unique  relationships.  Therefore,  within  a  given  family  of  SAA  pairs 
their  are  possible  combinations  to  substitute  into  proposition  II.  However,  it  is  only 
necessary  to  consider  the  input  of  18n  combinations  to  determine  if  a  previously  unknown 
family  can  be  generated  using  proposition  II.  This  is  shown  as  follows: 
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Let  J  =  {(iP,  7(0p)  |J.i}  and  I  =  {(j*,  70)*)  l*=J  be  sets  of  SAA  pairs  formed  under 
squaring.  The  sets  {(iP+  jfc),  (7(i)r+  70)k)  Ip,*=i>  and  {(iP+  ji),  (7(i)?+  7(j)i)  )  are  equal. 
Hence,  only  36n  of  the  possible  36n2  combinations  need  to  be  considered  when  applying 
proposition  II.  Furthermore,  the  input  of  (i,  7(i)  and  (7(i),  i)  generates  the  same 
relationships  as  the  input  (i-7(i),  -7(0)  and  (i,  7(0)-  It  follows  that  there  remains  only  18n 
combinations  that  are  unique. 

The  effect  of  18n  combinations  as  input  to  proposition  II  is  to  produce  relations  that 
develop  new  SAA  pairs.  We  shall  see,  that  on  some  ocassions  no  new  SAA  pairs  are  derived 

from  the  original  SAA  pair  by  applying  proposition  II. 

To  return  to  the  previous  example,  the  initial  condition  for  the  application  of  proposition 
II  was  that  exactly  one  SAA  pair  is  known  along  with  its  3n  relatives  from  procedures  R,  S 
and  T.  From  the  SAA  pair  (1,7)  and  its  relatives,  we  saw  it  is  possible  by  proposition  II  to 
find  all  63  SAA  pairs  for  the  polynomial  f(x)  =  x7+  x  +  1.  In  fact,  it  is  the  case  that  no 
matter  which  irreducible  polynomial  is  chosen  for  degree  n  =  7,  proposition  II  is  sufficient  to 
find  all  of  the  63  SAA  pairs.  However,  this  is  not  the  case  for  every  value  of  n.  For  larger 
values  of  the  degree  n  we  sometimes  are  not  able  to  find  all  SAA  pairs  by  application  of 
proposition  II  to  a  given  SAA  pair  and  its  relatives.  In  the  sequel  we  discuss  the  probability 
with  which  it  is  possible  to  complete  the  table  of  SAA  pairs  from  a  given  SAA  pair. 


Proposition  HI:  Let  (i,7(i)),  (j,7(j))  and  (k,  7(h))  be  SAA  pairs.  Then 

7(7(1)  +  7(j)  +  7(h))  =  7{i  +  7(j)  +  7(h)  -7(7(j)  +7(k))}  +  7(70)  +  7(h)) 


Proof:  1  +  x7(i)  +  7(j)  +  7(k)  =  1  + 


7(i)  7(j)  7(k) 

AAA  9 


=  !  +  (!+  «i)x,(i)x1'(k), 

,  ,  7(j)+7(k)  i  +7(j)+7(k) 

7(7(j)+7(k))  ,  i  +7(j)+7(k) 

=  X  T  x  > 


+  xi+7(j)+7(k)-7(7(j)  +  7(k))j  x7(7(j)+7(k)). 
7{i+7(j)+7(k)-7(7(j)  +  7(k))}  T(7(j)+T(k)) 

— •  v  ^  Y 


=  x 


from  which  the  result  follow  directly. 
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EXAMPLE:  For  n  =  7  allow  the  SAA  pair  (1,7)  and  the  family  of  21  SAA  pairs  associated  with 
it  to  be  known.  Hence,  (99,103),  (65,95)  and  (15,31)  are  known  SAA  pairs.  Let  j=99,  i  =  65 
and  k  =  15.  Substitution  into  Equation  (3)  yields  7(103  +  95  +  31)  =  7^65  +  103  +  31  - 
7(103+31))  +  7(103  +  31).  Then,  7(102)  =  7(71)  +  1  and  7(102)  =  79  +  1  =  80. 

If  we  let  7(j)  +  7(k)  =  K  where  K  is  an  element  in  a  family,  then  substituting  K  into 
proposition  III  tranforms  it  into  proposition  II.  Hence,  proposition  III  and  proposition  II  are 
equivalent.  As  additional  terms  are  included  in  the  right  hand  side  of  proposition  II, 
equivalent  relationships  are  created  which  do  not  provide  further  information. 

V.  RESULTS: 

Proposition  II  has  been  used  to  generate  SAA  pairs  for  every  PN  sequences  up  to  n=14. 

Not  every  SAA  pair  from  a  PN  sequence  possesses  the  necessary  structure  to  generate  all  the 
remaining  SAA  pairs.  At  this  time,  it  cannot  be  determined  a  priori  which  initial  SAA  pairs 
will  be  complete  generators  of  all  SAA  pairs.  However,  for  a  SAA  pair  to  be  a  generator  the 
following  conditions  must  be  satisfied: 

Let  (i,  j)  be  a  SAA  pair  that  forms  the  family  {(lt  ,  ht)  [J-J,  where  q  is  the  size  of  the 
family  and  (lt  ,  hj.)  is  a  SAA  pair.  Generate  the  following  set  of  2-tuples: 

{  ((U+i) ,  (ht+  (i.j») ,  ((U+j),  (h*+(j-i))) ,  (0i+(-i)),  (ht+(-j))), 

((h*+0  .  (U+  (H»)  ,  ((ht+j),  (lt+  (j-i)))  ,  ((ht+(-i)),  (l*+(-j)))  }. 

If  one  and  only  one  element  from  any  2-tuple  is  a  member  of  the  family  then  every  SAA  pair 
in  this  family  is  a  generator.  If  for  every  2-tuple,  one  and  only  one  element  is  not  a  member 
of  the  family,  then  every  SAA  pair  in  the  family  is  not  a  generator.  This  follows  from 
proposition  II.  Upon  closer  examination,  these  2-tuples  are  created  by  adding  i,  j,  (i-j),  (j-i),  -i, 
and  -j  to  the  each  member  of  {(lt  ,  ht)  ||=I}.  These  transformations  map  each  element  of  the 
pair  from  one  cyclotomic  coset  to  another  cyclotomic  coset.  It  is  this  mapping  that  determines 
which  SAA  pairs  will  be  generators.  The  function  to  compute  this  mapping  exists.  However, 
it  is  difficult  to  determine. 

If  a  SAA  pair  is  a  generator,  proposition  II  provides  a  tremendous  reduction  in  the 
calculations  needed  to  find  the  set  of  SAA  pairs  for  a  given  LFSR.  An  analysis  and 
determination  of  which  SAA  pairs  are  generators  has  be  carried  out  for  values  of  n  up  to  14. 

The  number  of  cyclotomic  cosets,  SAA  pairs,  and  families  (including  their  sizes)  have  been 
tabulated  below: 
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The  number  and  percentage  of  SAA  pairs  that  generate  SAA  pairs  of  families  other  than  their 
own  are: 

n=7:  63  -  100%  n=ll:  462  -  45% 

n=8:  72  -  61%  n=12:  720  -  35% 

n=9:  162  -  64%  n=13:  819  -  20% 

n=10:  285  -  56%  n=14:  1071  -  13% 

These  numbers  are  identical  for  every  LFSRS  of  the  same  degree.  However,  as  n  — ►  oo  the 
probability  that  a  randomly  selected  SAA  pair  generates  all  families  of  SAA  pairs  should 
approach  zero. 

We  note  that  more  can  be  gleamed  from  proposition  II  if  two  independent  SAA  pairs  are 
given.  If  two  or  more  SAA  pairs  from  different  families  are  known,  then  the  new  family 
becomes  the  union  of  the  families  from  each  of  the  separate  SAA  pairs.  Thus,  when  non¬ 
generator  families  are  combined,  the  structure  can  be  sufficient  to  generate  additional  SAA 
pairs  when  each  SAA  pair  alone  is  not  a  generator. 
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EXAMPLE:  Let  f(x)  =  x^  +  x^  +  1.  Assume  that  the  SAA  pairs  (30,  66)  and  (105,  248) 
have  been  selected  at  random.  Using  proposition  II,  no  additional  SAA  pairs  can  be 
generated  by  using  each  SAA  pair  alone.  However,  when  the  two  individual  families  are 
combined  into  one  larger  family,  the  entire  set  of  SAA  pairs  for  f(x)  can  be  generated  using 
proposition  II. 

For  f(x)  of  degree  n,  assume  that  two  SAA  pairs,  (i,  j)  and  (1,  k)  of  different  families 
have  been  randomly  selected.  Then,  the  probabilities  that  these  SAA  pairs  will  generate  the 
remaining  SAA  pairs  for  the  associated  LFSRS  are: 


n=7:  100% 
n=8:  99% 
n=9:  99% 
n=10:  99% 


n=ll:  100% 
n=12:  100% 
n=13:  93% 
n=14:  70% 


The  probabilities  have  not  been  calculated  when  three  or  more  SAA  pairs  of  different 
families  are  used  to  generate  the  remainig  SAA  pairs  for  the  associated  LFSRS. 

Additional  research  is  needed  to  discover  the  further  structure  of  SAA  pairs.  Ultimately 
we  would  like  to  determine  the  entire  nature  of  the  function  -y:  i  — ►  j. 


UNANSWERED  QUESTION 

Can  it  be  determined  a  priori  which  cyclotomic  coset  each  member  of  the  set 
{2Pi  +  k  |£=i}  will  belong  to  for  a  given  i  or  k? 
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Abstract 

A  multi-stage  pseudo-continuous-time  state-space  method  is  developed  for  designing 
large-scale  discrete  systems,  which  do  not  exhibit  a  two-  or  multi-time  scale  structure 
explicitly.  The  designed  pseudo-continuous-time  regulator  places  the  eigenvalues  of  the 
closed-loop  discrete  system  near  the  common  region  of  a  circle  (concentric  within  the  unit 
circle)  and  a  logarithmic  spiral  in  the  complex  2-plane,  without  explicitly  utilizing  the 
open-loop  eigenvalues  of  the  given  system.  The  proposed  method  requires  the  solutions  of 
small  order  Riccati  equations  only  at  each  stage  of  the  design.  Based  on  matching  all  the 
states  at  all  the  sampling  instants,  a  new  digital  redesign  technique  is  presented  for  finding 
the  pseudo-continuous-time  quadratic  regulator.  An  illustrative  example  is  presented  to 
demonstrate  the  effectiveness  of  the  proposed  procedures. 
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1.  Introduction 

Physical  realizations  of  engineering  systems  result,  in  general,  large-scale  models.  In 
most  cases,  it  is  quite  impractical  to  consider  the  analysis  and  design  of  the  large-scale 
system  model  itself.  Therefore,  a  necessity  arises  for  decomposing  the  original  system  into 
decoupled  subsystems,  each  with  their  own  distinct  characteristics,  so  that  the  resulting 
model  has  a  completely  decoupled  multi-time  scale  structure.  Some  of  the  existing  ap¬ 
proaches  for  decomposition  of  large-scale  systems  are  aggregation  [3],  multi-time  scales  [9j 
and  modal  analysis  [15].  However,  most  of  these  appear  to  be  restricted  to  the  continuous¬ 
time  systems.  The  corresponding  problem  for  large-scale  discrete- time  systems  has  received 
very  little  attention  [11,12,14].  Mahmoud  et  a /.  [11]  derived  a  matrix  norm  condition  for 
separating  large-scale  discrete-time  systems  into  two-time  scales  without  originally  assum¬ 
ing  the  availability  of  such  a  structure.  However,  computationally,  it  might  not  always  be 
feasible  to  satisfy  this  condition.  Shieh  et  a 1.  [18]  have  developed  an  algebraic  method 
based  on  the  matrix  sign  function  [16]  for  separating  the  slow  (dominant)  modes  from  the 
fast  (non-dominant)  modes  (two-time  scale  structure)  of  a  large-scale  multivariable  system 
(continuous  and  discrete).  The  matrix  sign  function  algorithm  has  been  used  for  the  fol¬ 
lowing:  block-diagonalization  and  block-triangularization  [17]  of  a  large-scale  system,  i.e.. 
decomposing  the  system  into  parallel  and  cascaded  structures;  for  solving  non-linear  Ric- 
cati  equations,  which  often  appear  in  feedback  design  of  systems  based  on  linear  quadratic 
theory;  and  for  model  conversions  of  systems  via  the  computation  of  the  principal  <jth  root 
of  the  system  matrix  [21,24].  Recently,  fast  and  stable  algorithms  have  been  developed  for 
the  computation  of  the  matrix  sign  function  [21]  and  for  the  computation  of  the  principal 
9th  root  of  a  complex  matrix  [24]  which  in  turn  can  be  used  for  discrete-to-continuous 
model  conversion.  These  algorithms  will  be  utilized  in  the  development  of  our  multi-stage 
design  procedure  for  designing  suboptimal  discrete  controllers  with  pole  assignment  near 
a  specified  region  of  the  complex  z-plane. 

The  optimal  linear  quadratic  (LQ)  design  method  has  several  good  properties.  For 
instance,  the  closed-loop  system  is  stable  and  has  good  robustness  properties  provided 
the  weighting  matrices  satisfy  certain  positivity  conditions  [2].  The  transient  behavior  of 
the  closed-loop  system  is,  however,  difficult  to  determine  since  there  is  a  complex  relation 
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between  the  weighting  matrices  and  the  closed-loop  poles.  This  implies  that  the  weighting 
matrices  have  to  be  determined  through  trial  and  error.  Pole  placement  methods  have  the 
advantage  that  the  closed-loop  poles  can  be  specified.  The  drawback  is  the  non-uniqueness 
of  choice  of  feedback  for  multivariable  systems.  Further,  it  is  too  restrictive  to  place  the 
poles  in  pre- determined  locations  [1],  since  for  non-linear  systems  the  exact  locations  of 
the  closed-loop  poles  might  be  difficult  to  attain  for  each  operational  condition.  Hence, 
in  general,  it  would  suffice  to  have  the  poles  placed  within  a  specified  region.  Also,  the 
regional  pole  assignment  method  is  suited  for  tradeoffs  between  eigenvalue  locations,  actu¬ 
ator  signal  magnitudes  and  requirements  of  robustness  against  large  parameter  variations, 
sensor  failures,  implementation  accuracies,  gain  reduction,  etc.  [l].  In  this  paper,  we  con¬ 
sider  the  common  region  of  a  circle  and  a  logarithmic  spiral  in  the  z-plane  (Fig.  2)  for  pole 
assignment.  This  i6  equivalent  to  the  sector  region  (hatched)  in  Fig.  1  in  the  a-plane.  It 
is  well-known  that  if  the  poles  of  a  system  lie  within  the  above  mentioned  region(s),  then 
the  system  responses  converge  at  appropriate  speed  and  any  existing  vibrating  modes  are 
well-damped. 

The  problem  of  designing  feedback  gains  to  optimally  place  all  the  poles  of  a  closed- 
loop  system  within  a  specified  region  was  first  studied  by  Anderson  and  Moore  [2],  who  used 
a  shifted  system  matrix  to  obtain  an  optimal  closed-loop  system  with  its  eigenvalues  lying 
in  the  open  left-hand  side  of  a  vertical  line  on  the  negative  real  axis.  Shieh  et  ad.  [19,22] 
extended  this  idea  to  optimally  place  the  poles  within  a  vertical  strip  as  well  as  a  horizontal 
strip  in  the  left-half  plane.  Kawasaki  and  Shimemura  [8]  propsed  an  iterative  procedure  to 
place  the  poles  inside  a  hyperbola  in  the  left-half  plane,  which  is  actually  an  approximation 
of  the  sector  region  shown  in  Fig.  1.  In  [23],  a  pseudo-continuous-time  method  has  been 
developed  to  place  the  eigenvalues  of  a  discrete  system  (having  a  sufficiently  small  sampling 
period)  within  the  hatched  region  of  Fig.  2.  However,  it  involves  the  solution  of  full  order 
Riccati  equations,  which  could  be  computationally  difficult  for  large-scale  systems.  The 
Luenbeger  transformation,  sometimes  numerically  unstable,  is  utilized  to  transform  the 
full  order  discrete-time  system  to  its  equivalent  canonical  form  so  as  to  determine  the 
pole-placement  discrete  feedback  gain.  In  this  paper,  at  each  stage  of  the  design,  only 
reduced  order  Riccati  equations  need  to  be  solved  and  also,  the  transformation  to  the 
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general  canonical  forms  is  avoided. 

For  digital  implementation  of  the  designed  continuous-time  controller,  the  continuous¬ 
time  controller  (analog  controller)  ne^us  to  be  converted  into  an  equivalent  discrete-time 
controller  (digital  controller).  This  is  a  digital  redesign  problem  [10].  Based  on  a  bilinear 
transform  method,  a  new  digital  redesign  technique  is  presented  for  finding  the  equivalent 
digital  controller  from  the  designed  analog  controller. 

The  material  in  this  paper  is  organized  as  follows:  Section  2  contains  a  review  of  the 
results  associated  with  the  design  of  a  linear  quadratic  regulator  which  would  optimally 
place  the  closed-loop  eigenvalues  of  a  continuous-time  system  on  or  within  the  hatched 
region  of  Fig.  1.  In  Section  3,  a  new  digital  redesign  technique  is  presented  for  converting 
the  continuous-time  control  law  to  an  equivalent  discrete-time  control  law.  In  Section  4,  a 
method,  using  the  matrix  sign  function,  for  block-decomposing  the  equivalent  large  scale 
continuous-time  system  into  a  multi-time  scale  structure  is  introduced.  Then,  a  pseudo- 
continuous-time  multi-stage  design  procedure  is  presented  for  designing  large-scale  discrete 
systems  with  pole  placement  near  the  hatched  region  of  Fig.  2.  An  illustrative  example  is 
given  in  Section  5  to  demonstrate  the  effectiveness  of  the  proposed  design  procedure  and 
ihe  conclusions  are  summarized  in  Section  6.  Some  computational  algorithms  are  given  in 
an  appendix. 

2.  Continuous-time  optimal  quadratic  regulators  with  pole  placement 

Consider  the  linear  controllable  continuous-time  system  described  by 

xc(t)  =  Axc(t)  4-  Buc(t)\  xc(0)  (1) 

where  xc{t)  and  uc(t)  are  the  n  x  1  state  vector  and  the  m  x  1  input  vector,  respectively, 
and  A  and  B  are  constant  matrices  of  appropriate  dimensions.  Let  the  quadratic  cost 
function  for  the  system  in  (1)  be 

J=  f  (xJ{t)Qxc{t)  +  uJ{t)Rue(t))  dt  (2) 

Jo 

where  the  weighting  matrices  Q  and  R  are  n  x  n  non-negative  definite  and  m  x  m  positive 
definite  symmetric  matrices,  respectively.  The  feedback  control  law  that  minimizes  the 
performance  index  in  (2)  is  given  by  [2] 

uc(t )  =  -Kexe(t)  +  Ecr(t)  =  - R-'BTPxe(t )  +  Ecr{t)  (3) 
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where  Kc  is  an  m  x  n  feedback  gain,  Ee  is  an  m  x  m  forward  gain,  r(t)  is  a  reference  input, 
and  F,  *>n  n  x  n  non-negative  definite  symmetric  matrix,  is  the  solution  of  the  Riccati 
equation, 

PBR-1  BtP  -PA-  AtP  -  Q  =  0„  (4) 

with  {Q,A)  detectable.  The  superscript  T  and  the  matrix  0n  denote  the  transpose  and 
the  n  x  n  null  matrix,  respectively.  Thus  the  resulting  closed-loop  system  becomes 

xc(t)  =  (A  -  BKc)xe(t)  +  BEer{t)  (5) 

The  eigenvalues  of  A  —  BKe ,  denoted  by  er(A  -  BKC ),  lie  in  the  open  left-half  plane  of 
the  complex  a-plane.  Our  objective  is  to  determine  Q ,  R  and  Ke  so  that  the  closed-loop 
system  in  (5)  has  its  eigenvalues  on  or  within  the  hatched  region  of  Fig.  1.  The  important 
results  along  with  the  design  procedure  to  achieve  the  desired  design  are  presented  in  the 
following. 

Lemma  1  [2,23]:  Let  ( A,B )  be  the  pair  of  the  given  open-loop  system  in  (1).  Also,  let 
h  >  0  represent  the  prescribed  degree  of  relative  stability.  Then,  the  eigenvalues  of  the 
closed-loop  system  A  —  BR~J  BT P  lie  to  the  left  of  the  —  h  vertical  line  with  the  matrix 
P  being  the  solution  of  the  Riccati  equation, 

PBR-'BtP  -  P(A  +  hln )  -  {A  +  hIn)TP  =  0n  (6) 

where  the  matrix  In  is  an  n  x  n  identity  matrix.  • 

Theorem  1:  [23]  Let  the  given  stable  system  matrix  A  £  7£nXn  have  eigenvalues  X~  (t  = 
1, . . .  ,n" )  lying  in  the  open  sector  of  Fig.  1  and  the  eigenvalues  A'1'  (t  =  1, . . .  ,n+)  outside 
that  sector,  with  n  =  n'-fn+.  Now,  consider  the  two  Riccati  equations, 

QBR~'BtQ  -  Q{-A7)-  {-A7)TQ  =  0n  (7g) 

and 

PBR~'BtP  -PA-  AtP  -Q  =  0n  (76) 

Then,  the  closed-loop  system, 

Ac  =  A  -  yBKc  =  A  -  -yBR'* BT P,  (8) 
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will  enclose  the  invariant  eigenvalues  A"  (t  as  l,...,n~)  and  at  least  one  additional  pair 
of  complex  conjugate  eigenvalues  lying  in  the  open  sector  of  Fig.  1,  for  the  constant  gain 
7  in  (8)  satisfying 


,  1  6  +  s/b7  +  ac 

1  >  max{-, - - - } 

l  a 


(9) 


where  a  =  ri^BR"1  BTP)2],  b  =  tx[BR-iBTPA )  and  c  =  ^[BR"1  BTQ ].  ■ 

Remark  1:  The  matrix  Q,  the  solution  of  the  Riccati  equation  in  (7a),  contains  the 
eigenvectors  associated  with  the  eigenvalues  ( i  =  1, . . . ,  n“)  of  A  tying  in  the  open  sector 
of  Fig.  1.  This  matrix  is  used  as  a  state  weighting  matrix  in  the  Riccati  equation  in  (76) 
for  solving  the  matrix  P.  As  a  result,  the  asymptotically  stable  closed-loop  system  matrix 
Ae  in  (8)  contains  the  invariant  eigenvectors  and  associated  eigenvalues  A“  (t  =  1, . . .  ,n~) 
of  A.  The  steady  state  solutions  of  the  Riccati  equations  in  (6)  and  (7)  can  be  found  using 
the  matrix  sign  function  techniques  [4,17]  and  the  eigenvalue-eigenvector  approach  [7].  A 
brief  review  of  this  is  given  in  the  Appendix. 

Continuous-time  Design  Procedure 

Step  T.  Let  the  given  continuous-time  system  be  as  in  (1).  Specify  h  so  that  the  —  h  vertical 
line  on  the  negative  real  axis  would  represent  the  line  beyond  which  the  eigenvalues  have 
to  be  placed  in  the  sector  of  Fig.  1.  Also,  assign  A0  =  -4  and  the  positive  definite  matrix 
R.  Set  t  =  1.  If  the  system  is  unstable,  then  solve  (6)  to  obtain  the  closed-loop  system 
Ai  »  A  -  -roBR~1BTP0  =  A- 7o  BK o,  with  70  =  1;  else  (stable  system)  go  to  Step  2  with 


A]  =  A,  P0  =  0n  and  70  =  0. 

Step  2:  Solve  (7a)  for  Qi  with  A  :=  A*.  Check  if  |tr [BR~l BTQi]  is  zero.  If  it  is  equal 
to  zero,  go  to  Step  4  with  j  =  i;  else,  continue  and  go  to  Step  3.  Note  that,  when 
5tr[BB~'1BTQj]  5=  0,  all  eigenvalues  of  the  matrix  A,  lie  on  or  within  the  open  sector  of 
Fig.  1. 

Step  3:  Solve  (7b)  for  Pi  with  A  :=  A,  and  Q  :=  Qi.  Then,  the  constant  gain  7*  can  be 
evaluated  using  (9).  The  closed-loop  system  matrix  is 


Ai+ j  —  Ai  -  7 xBR~lBrPx  =  Ai  -  • nBKi 


(10a) 


Set  t  :=  i  +  1  and  go  to  Step  2. 
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Step  4:  Check  if  tr[(Aj  +  Wn)]+  (sum  of  the  eigenvalues  to  the  right  of  the  vertical  line  at 
—  h)  is  zero.  If  it  is  equal  to  ztro,  go  to  Step  5  with  P3+1  =  on  and  7,-+ j  =  0;  else,  solve 
(6)  for  Pj+ 1  with  A  :=  Aj  and  obtain  the  closed-loev  system  Aj  —  -yj+iBR"1  BTPj+i  = 
Aj  -  with  7;+i  =  1  and  Kj+l  =  R~1BTPj+ 

Step  5:  The  designed  closed-loop  system  is 

i+i 

A,-BR~lBTY,^Pk  (10  b) 

fc= o 

and  its  eigenvalues  lie  in  the  hatched  region  of  Fig.  1.  Note  that  the  above  system  matrix 
in  (10b)  is  equal  to  the  system  matrix  in  (5),  A  -  BR~i BT P,  where  P  i6  the  solution  of 
the  Riccati  equation  in  (4)  with 

3 

Q  =  2h(P0  +  Pj+1)  +  £(<?,  +  A7.PBP-1  BTPi  )7i  (10c) 

i=l 

In  the  above  equation,  A7{  =  7,-1  and  the  matrix  R  is  as  originally  assigned.  Also,  the 
optimal  continuous-time  regulator  can  be  given  as 

3+1  v 

X>*.  h  c(t)  4  Ecr(t)  =  ~Kcxc(t)  +  Ecr(t)  (10 d) 

»=o  ' 

where  r(f)  is  any  reference  input,  Ec  is  any  forward  gain,  and  Kc  is  the  desired  stale 
feedback  gain. 

3.  Model  Conversions  and  Digital  Redesign 

For  digital  implementation  of  the  obtained  optimal  continuous-time  regulator  in  (lOd), 
we  need  to  convert  the  continuous-time  system  in  (1)  and  continuous-time  control  law  in 
(lOd)  into  an  equivalent  discrete-time  model  and  discrete-time  control  law,  respectively. 
3.1  Model  Conversions 

Let  the  state  equation  of  the  digital  system  which  approximates  the  continuous-time 
system  in  (1)  be  represented  by 

xd(i)  =  Axd(t)  +  Bud(t)\  zd(0)  (11a) 

where  u d(i)  is  a  piecewise  input  function, 

t id(t)  =  ud(kT)  for  kT<t<(k  +  l)T  (lib) 
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where  T  is  the  sampling  period,  then  we  can  write  the  equivalent  discrete-time  model  as 

+  T)  =  Gx,i{kT)  ■+  H Ui{kT)\  zj(0)  (12a) 

where 

G  =  exp(AT)  and  H  =  [G  -  In\A~1B  (12 6) 

In  general,  the  matrices  G  and  H  can  be  determined  exactly  from  the  matrices  A  and  B , 
and  the  input  function  itj(t)  in  (116)  using  the  eigenvalue  and  eigenvector  approach  [13]. 
However,  for  computational  purposes,  approximations  are  required  for  obtaining  G  and 
H  matrices  without  involving  the  eigenvalues  explicitly.  There  are  a  number  of  methods 
available  [13]  to  evaluate  approximately  G  and  H  given  in  (12),  the  simplest  of  them  is 
the  truncation  of  the  infinite  series  of  exp(AT)  [13]  which  results  in  good  approximation 
when  T  is  sufficiently  small.  A  popular  method  for  determining  G  and  H  approximately  is 
the  Pade  approximation  method  [13,20].  Some  of  the  approximations  obtained  using  this 
method  are  listed  below: 

G  =  [/„  -  i Arj-'IA,  +  \at\  k  c, 

» i4  -  \  at  +  *  '-AT+  iur)*j  £  G, 

and 

B  =  r|/„-  l-AT}-'B±  H,  (14a) 

*r|/„-  (141*) 

It  can  be  noted  that  the  matrices  G 3  in  (13a)  and  H 3  in  (14a)  correspond  to  the  popular 
Tustin  approximation  (bilinear  transformation)  [6].  The  matrices  G 5  and  H 5,  when  used 
with  even  large  sampling  periods,  provide  good  approximations.  The  use  of  scaling  and 
squaring  method  [13]  as  shown  below,  along  with  one  of  the  above  approximations,  would 
result  in  better  approximations: 

G  \eATlm]m  ,  m  is  a  power  of  two  (15) 


(13a) 

(136) 


212 


Now,  given  a  discrete-time  model  as  in  (12),  an  equivalent  continuous-time  model  in  (11) 
can  be  obtained  by  using 

A=^ln{G)  and  B  =  A\G  -  In}~' E  (16) 

As  before,  the  matrix  A  can  be  obtained  from  its  discrete  equivalent  G  exactly  by  using  the 
eigenvalue  and  eigenvector  approach.  It  can  also  be  obtained  approximately  by  truncating 
the  infinite  power  series  of  the  matrix  logarithmic  function,  In  ((?),  subject  to  certain 
convergence  conditions.  Shieh  et  ad.  [20]  have  proposed  a  direct  truncation  method  and 
a  matrix  continued  fraction  method  for  determining  A  from  G.  The  commonly  used 
approximation  for  y  In  ((?),  obtained  using  the  matrix  continued  fraction  method  is 

^  =  jin  (C)  *  f  *  =  f  A[/„  -  -  f  *T’  (17) 

where  R  =  [G  —  /n][G  +  /n]~1.  The  matrix  series  approximations  obtained  from  truncation 
or  continued  fractions  converge  when  Re  (cr((?))  >  0,  where  r(G)  represents  the  eigenvalues 
of  G.  In  general,  the  eigenvalues  of  the  matrix  G  are  not  available,  and  they  do  not  always 
lie  in  the  right  half  of  the  complex  r-plane.  In  order  to  satisfy  the  convergence  condition, 
the  principal  9th  root  of  the  matrix  G  [20,2l.24j  can  be  made  use  of.  Shieh  et  a  I.  [21, 
and  Tsai  et  ad.  [24]  have  recently  developed  a  fast  and  stable  algorithm  for  computing 
the  principal  qth  root  of  a  general  complex  matrix.  This  is  listed  in  Appendix,  A-l.  The 
eigenvalues  of  VG  lie  in  the  right  half  of  the  complex  r-plane,  i.e.,  Re  (a(\/G))  >  0,  for 
q  >  2.  Therefore,  instead  of  G  the  principal  qth  root  of  G  can  be  used  in  determining  an 
approximation  for  A.  In  this  case,  the  matrix  equation  (16)  becomes 

A=iln(G)=|ln(yG)  (18) 

As  a  result,  the  matrix  R  in  equation  (17)  would  become  R  :=  [  VG  -  /n][<j/G  +  Jn]-1  and 
the  constant  factor  2/T  would  be  replaced  by  2q/T.  The  condition  for  the  convergence  of 
the  power  series  of  In  (<J /'G)  becomes  arg  (o‘(G))  ^  rr  and  det  (G)  ^  0,  which  is  a  much 
less  restrictive  condition. 

3.2  Digital  Redesign  by  Matching  of  States 
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Let  the  digital  control  law  for  the  discrete-time  open-loop  system  in  (12)  be 


ud(kT)  =  - Kdxd(kT )  +  Edr{kT)  (19) 

The  designed  hybrid  closed-loop  system  in  (11)  becomes 

xd(t)  =  Axd{t)  -  BKdxd(kT)  +  BEdr(kT)  ;  **(0)  (20) 

for  kT  <  t  <  (k  +  1)T,  where  Kd  €  Rmxn  and  Ed  €  RmXm  are  the  digital  state-feedback 
and  forward  gains,  respectively.  A  zero-order  hold  is  utilized  in  (20).  Now,  the  digital 
redesign  problem  is  reduced  to  finding  the  digital  state-feedback  gain  Kd  and  forward 
gain  Ed  in  (19)  from  the  continuous  state-feedback  gain  Kc  and  forward  gian  Ec  in  (3)  so 
that  the  states  of  the  digital  system  in  (20)  are  approximately  equal  to  the  states  of  the 
continuous-time  system  in  (5)  at  the  sampling  instants,  for  a  given  -"(f). 

Assuming  r(t )  «=  r(kT)  over  one  sampling  period,  we  have  the  respective  discrete 
models  of  (5)  and  (20)  as  follows: 

xe(kT  +  T)  =  Gxc(kT)  +  HEcr(kT)  :  xc(0)  (21) 

and 

xd(kT  +  T)  =  (G-HKd)xd(kT)  + HEdr(kT)  ;  *rf(0)  (22) 

where  6  =  e^-BK^T,  H  =  J*  ^A~BK<^Bd\  =  [G  -  In)(A  -  BKc)'lB,  G  =  eAT  and 
H  =  eAXBd\  =  [{?  —  In}A~1B.  To  match  all  n  states  of  the  digital  system,  xd(kT)  in 
(22),  and  those  of  the  continuous-time  system,  xc(kT )  in  (21),  at  each  sampling  instant, 
it  is  sufficient  that  the  following  equations  are  satisfied: 

G  -  G  -  HKd  (23 0) 

and 

HEC  =  HEd  (23  b) 

where  the  m  x  n  (non-square)  feedback  gain  Kd  and  the  m  x  m  forward  gain  Ed  are 
unknown  matrices  to  be  solved. 
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Alternative  representation  of  (23)  is 


e(A-BK')T  _  eAT  _  eAT  _  j^A~ 


(24a) 


[{e(A-BK')T  _  ^  _  BK^B]Ec  =  [{eAT  _  In)A-*B]Ed 


(246) 


In  the  existing  digital  redesign  technique  (10],  the  Kd  and  E d  in  (24)  are  considered  as  the 
functions  of  sampling  period,  i.e.,  Kd{T )  and  Ed(T),  respectively,  and  they  are  expanded 
into  a  Taylor  series  about  T  —  0  as 


Km-gsXjtKPlTV 


where  K“\T)  -  «^P|T=o,  end 


(25a) 


is  ft  *  * 


(256) 


where  El”(T)  = 

In  a  similar  manner,  the  exponential  matrix- valued  functions  in  (24)  are  expanded  into  a 
Taylor  series  about  T  =  0.  As  a  result,  (24),  together  with  (25),  can  be  written  as 


^  [A-  BKcyp  _  ^  A’T*  ~  XT*  1  ^  K[dk)(T)Tk 

h  >!  &  ^'!  feO>1)r *! 


(26  a) 


£;  (i  +  i)>  1  +  k\  m) 

To  match  all  the  states  of  the  continuous  and  the  digital  systems  with  a  sufficiently  small 
sampling  period,  the  approximated  digital  feedback  gain  (defined  as  Kd)  and  the  approx¬ 
imated  digital  forward  gain  (defined  as  Ed)  are  determined  [10]  by  taking  the  first  two 
terms  of  all  associated  matrix  series  expansions  in  (26)  as 


-i  00  p(*)/ 


(266) 


Kd  =  Kc  4-  -KC{A  -  BKC)T 


(27  a) 
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(276) 


Ed  =  [Im  -  ^xcsr]££ 

In  this  paper,  we  propose  a  bilinear  transform  method  to  solve  explicitly  for  the 
desired  mxn  (non-square)  feedback  gain  Kd  and  mxm  forward  gain  Ed  in  (24)  by  taking 
infinite  terms  of  the  modified  Taylor  scries  expansions  of  e(A~Bf(c)T  an(j  gAT  jn  (24)  jn 
the  following. 

The  matrix- valued  function  of  eXT  with  X  €  RnXn  and  a  sampling  period  T  can  be 


represented  by  an  infinite  series  [6]  as 

1  00  1 

tXT  =  In  +  XT  +  -{XT)2  +  ]T  Tt{XTy  (28) 

»=3  *’ 

The  infinite  series  in  (28)  can  be  approximated  by  a  geometric  series  as 

OO 

eXT  =  /„  +  XT  +  j(AT)1  -4-  £  jij-U'iy  (29a) 

i=3 

=  In  +  XT[In  +  (l-XT)  +  C-XT)2  +  C-XT)3  +  ...]  (296) 

=  In  +  XT\In  -  ±XT}-'  for  ||/n  -  ~XT\\  <  1  (29c) 

=  [/„  -  \xT\-'\In  +  5 AT]  for  ||/n  -  ^AT||  <  1  (29d) 


Note  that  the  first  three  dominant  terms  of  (28)  are  equal  to  those  of  (29 a),  while  other 
terms  differ  in  weighting  factors  1/i!  in  (28)  and  l/2‘-5  in  (29a).  Also  that  the  sampling 
period  T  in  (29)  can  be  chosen  to  satisfy  the  sufficient  condition,  ||/n  —  |AT||  <  1,  in  (29). 

When  the  matrix  X  in  (28)  is  a  continuous-time  system  matrix,  eXT  becomes  the 
equivalent  discrete-time  system  matrix.  Then  the  model  conversion  in  (29<f)  corresponds  to 
the  popular  Tustin  approximation  (bilinear  transformation).  The  selection  of  the  suitable 
sampling  period  T  for  the  Tustin  approximation  method  and  its  applications  to  digital 
control  system  design  in  the  frequency  domain  have  been  investigated  in  [5j.  Also,  complete 
analysis,  design  and  implementation  of  pseudo-continuous-time  controllers,  developed  via 
the  frequency-domain  bilinear  transformation,  for  discrete-time  systems  can  be  found  in 


With  the  use  of  the  bilinear  transformation  as  shown  in  (29 d),  we  express  the  matrices 
G,  H .  G  and  H  in  (23)  as  follows: 

G  =  *  [7«  -  ±{A  -  BKe)T}-'[In  +  \(A  -  BKC)T)  (.  Oa) 

H  =  [J*-bK')t  _  7n](A  _  BKC)~'B  =  [7n  -  i(i4  -  Btf^rj^BT  (306) 


G  =  e^S=[7n-iAT]-1[/n+^r] 


(30c) 


H  =  (eXT  -  7n]A~3B  St  [7n  -  -AT)-3Br 


(30rf) 


To  solve  explicitly  for  the  desired  m  x  n  (non-square)  feedback  gain  7$^  and  m  x  m  forward 
gain  Ed  in  (23),  we  make  use  of  the  following  matrix  inversion  formula  (7j: 


(A  +  BC-1D)-1  =  A'1  -  A-'B(C  +  DA~2 B)'1  DA 


-2  d\-2  n  / 1-2 


Thus,  the  matrix  [7„  —  ^(.4  —  BKC)T ]  1  in  (30a)  can  be  represented  as 

!(7„  -  \aT)  +  ^BKCT}'2  =  ((7„  -  l, AT)  +  Br(27m)"3  tfc]-3 
=  (7r.-^4T)-3  -(7n-^rr,Br;2/m^7tf(7n-i.4r)-5Br]-17tc{7n-i.4r)-1  (32) 

Substituting  (30)  and  (32)  into  (23)  and  utilizing  the  inverse  bilinear  transformation  in 
(30c)  and  (30d),  we  have 


G  =(In  -  l-AT)-l{In  +  ±AT)  -  (7n  -  ^AT)"  3BT[27m  +  Kc(In  -  l-AT)~' BT}~' 
x  Ke(In  -  ^AT)~1(In  +  l AT )  -  ;~(7„  -  ±AT)-' BTKe 
+  \(In  -  \aT)-' BT[2Im  +  Kc(In  -  ^AT)'1  BT}'1  Ke(In  -  l~AT)~lBTKc 
-  H\2Im  +  ifcB]~37rcG  -  ^BAC  +  ^B|27m  +  KCH)-'KCHKC 


—G  -  B(^(7m  +  Kc(In  +  G)} 


(33a) 


Ar)~3Br 
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-  (/„  -  l-AT)~'BT[2Im  +  Ke(In  -  l-AT)-'BT}-'  x  JCc(/n  -  1-AT)-'Bt}ec 
*{H  -  H(2lm  +  KcHy'KJI^c 

=H(Ini+^KcH)~1Ee  (336) 

Note  that  the  exact  system  matrix  G  and  input  matrix  H  in  (33)  are  approximations  of 
the  respective  bilinear  transformed  system  matrix,  (/„  -  \AT)~l{In  +  \AT),  and  input 
matrix,  (I„  —  \AT)  1 BT.  These  inverse  approximations  can  be  justified  by  the  same 
reason  shown  in  (28)  and  (29).  Comparing  (33)  with  (23),  we  obtain  the  desired  digital 
state-feedback  gain  Kd  and  forward  gain  Ed  as 

Kd  =  \{Im  +  '-K'Hy'K'il*  +  G)  (34  a) 

and 

Ed  =  {Im+l-KcH)-lEc  (346) 

If  the  exact  system  matrix  G  in  (34a)  and  input  matrix  H  in  (346)  are  replaced  by  the 
bilinear  transformed  models  G3  and  H 3  in  (13a)  and  (14a),  respectively,  the  resulting 
digital  redesign  gains  in  (34a)  and  (346)  reduce  to 

Kd  =  Kc  (ln  -  ~(A  -  SA'c)rj  (34c) 

and 

Ed  =  (An  ~  l-KdBT)Ec  (34 d) 

Since  the  matrix  exponential  formulation  G3  in  (13a)  is  equivalent  to  a  tr-plane  matrix 
representation  (bilinear  transformation)  [6],  the  obtained  controller  in  (19),  which  utilizes 
the  digital  gains  in  (34c)  and  (34<f),  is  equivalent  to  a  u'-plane  pseudo-continuous  controller 
[6]- 

To  compare  the  digital  redesign  gains  [10]  in  (27),  we  represent  the  matrices  (7m  4- 
~KcH)~l  and  G(=  eAT)  in  (34a)  and  (346)  by  a  respective  infinite  series  as 

(An  +  =  An  -  \kcH  +  f^(~l-KcHy  (35a) 

>=2 
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and 


(356) 


G  =  eAT  =  /„  +  AT  +  £  ^{ATY 

i- 2J‘ 

When  the  sampling  period  T  is  sufficiently  small,  we  can  approximate  the  respective  infinite 
series  in  (35)  by  taking  the  first  two  terms  only  and  then  solving  for  the  desired  gains,  i.e., 

Ki  =  ^[/m-  ±KcH]Ke(2In  +  AT) 

=  Kc  +  -KC(A  -  BKC)T  -  -KCBKCAT 2 
2  4 

%  Ke  +  -  BifJT  =  Kd  (36a) 

and 

£*  =  (/«  -  l-KcBT\Ec  =  (366) 

The  approximate  gains  (ki  and  Ed)  in  (36)  are  those  obtained  in  (27). 

Thus,  we  conclude  that  the  digital  redesign  gains  in  (27)  are  the  approximations  of 
the  proposed  digital  redesign  gains  in  (34). 

4.  Pseudo-continuous-time  Suboptimal  Quadratic  Regulators 

4.i  Block-diagonalization  via  matrix  sign  function 

In  the  following,  the  results  leading  to  the  decomposition  of  a  continuous  system  into 
a  multi-time  scale  structure  are  presented. 

Definition  1  [18]:  Let  the  eigenvalues  of  a  continuous-time  stable  system  matrix,  A  € 
7 ZnXn,  be  A,,i  =  l,...,n.  The  non-dominant  modes  of  this  system  are  the  modes  with 
Re(Xi)  <  —h,  where  h  is  a  positive  real  number,  while  the  dominant  modes  are  those 
having  Re(Aj)  >  —  h,  where  Re(-)  represents  the  real  part  of  (•). 

Theorem  2  [17]:  Let  A  €  7ZnXn  and  {Re(<r(A))}  n  {h,,i  =  0,1,..., k}  =  0,  where  <r(A) 
represents  the  eigenspectrum  of  A,  hi  €  7Z,  i  =  0,1,..., k.  Let  a  set  of  matrix  sign 
functions  (see  Appendix  A-2)  be 

sign(fc  j  (h(A))  =  sign  (A  —  hxIn)  for  t  =  0,l,...,fc  (37a) 
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Define 


S,  =  ind  [sign(\i  itfci)  (*(*))]  e  ^nxnS  1  <  *  <  k  (376) 

where  ind(-)  repre'-uts  the  collection  of  the  linearly  independent  column  vectors  of  (•)  and 

siSn(fci-i.fc.)  (M^))  =  ih(A))  -  s\gn{Ki)  (fc(j*))]  (37c) 

with  h. o  =  0  and  sign^j  ( h{A ))  =  J„.  Assume  that  rii  ^  0  for  1  <  i  <  k.  Then 

Ar  =  =  block  diag  [ARk,  ARik-D, . . . ,  Am]  (38a) 

where  Mt  is  the  right  block  modal  matrix  given  by 

M,  =  [5fc,S*_i,...,Si]  (386) 

and 

ARi  =  S+ASi  e  nni*n'  for  1  <  t  <  fc  (38c) 

where  Sf  £  Tl*1' XTl  is  the  left  inverse  of  S,  and  is  defined  as  S+  =  (5jr5t)~1StT.  * 

4.2  Pseurio-continuous-time  multi-stage  design  procedure 


Let  the  given  large-scale  discrete-time  system  with  appropriate  sampling  period  T  be 

xd{kT  +  T)  =  Gxd{kT)  +  Hud{kT)-,  *rf(0)  (39a) 

Also,  let  the  dimension  of  the  system  be  n  and  the  number  of  inputs  be  m.  The  procedure 
is  to  first  transform  the  discrete- time  system  to  an  equivalent  continuous-time  model.  Next 
decompose  the  continuous-time  model  into  a  multi-time  scale  structure,  using  techniques 
based  on  the  matrix  sign  function,  then  design  each  decomposed  subsystem  via  the  design 
technique  shown  in  Section  2,  and  finally  determine  the  suboptimal  digital  regulator  for 
the  whole  large-scale  system  via  the  new  digital  redesign  technique  shown  in  Section  3.2. 
Step  1:  Transform  the  give  discrete-time  system  to  an  equivalent  continuous-time  system 
using  the  technique  shown  in  Section  3.1  as 

xe(t)  =  Axc(t)  +  Buc(t)  \  xc(0)  (396) 
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Step  2:  Set  i  =  1,  A  :=  A,  B  :=  B,  Q  :=  On  and  the  feedback  gain  Rc  =  OmXn. 

Step  3:  Now,  specify  a  positive  real  scalar  hi  (see  Definition  1)  and  find  a  transformation 
matrix  A/j*^  such  that  the  matrix  A  can  be  block-diagonalized  into  the  following  form: 

A  :=  (A/{**)-1  AA/{'*  =  block  diag  [Ac,Aj,A,]  (40a) 

where  Ac  £  represents  a  block,  which  has  already  been  designed  or  does  not 

need  to  be  designed,  and  the  matrices  A{  £  TlhiXhi  and  A,  €  with  n,  =  n»  -f  nt, 

contain  eigenvalues  with  real  parts  less  than  and  greater  than  —  hi,  respectively.  The 
transformation  matrix  is  given  by 

M[x)  =  block  diag  [jn_ni,(S2, 5,)]  (406) 

where  Sj  €  and  S2  £  "Rnxni  are  as  defined  in  (37)  with  respect  to  the  matrix  6ign 

function  of  the  matrix  A*,  where  A,-  :=  block  diag  [A*,At],  i  >  1,  and  A<  :=  A,  t  =  1. 
Using  transform  B  as 

5  :=  {M™)-'B  =  [BJJJ.BJ?  (40c) 


The  dimensions  of  the  matrices  Bc,  Bi  and  5,  are  (n  —  m)  x  m,  n,  x  m  and  n,  x  m, 
respectively.  Accumulate  the  transformations  in  A/j'5  :=  A/{,"~1*A/{‘\ 

Step  4:  The  subsystem  considered  for  design  at  this  stage  is  (Aj,.S;).  Let  the  immmediate 
optimal  closed-loop  continuous-time  system  be  (Ac,  ,5,). 

Step  5:  Update 


A  :=A-BfOmxi A\]  =  [  A‘  j  ‘ 


(41) 

(42) 


and 

<?:=(?  +  [(A/1(<))-1]T[block  diag[0n_ft.,Qi])(A/1(,))-1  (43) 


where  A,  =  block  diag  [Ae,Aj],  TT\  =  —  \Bj ,  Bj]T  R{  and  the  dimensions  of  the  matrices 
Ai  and  TV,,  and  Qt  are  (n  -  n<)  x  (n  -  flj),  (n  -  fij)  x  h{  and  hi  x  fij,  respectively. 
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Step  6:  Block-diagonalize  the  partially  designed  system  A  and  move  the  last  block  of  A  in 
(42)  (viz.,  ACl )  to  the  first  block,  via  a  transformation  matrix  which  is  given  as 

A/J°  =  n  1,  (A/j^)-1  =  f  0ft*x(n-ft*)  7*f  (44a) 

[■‘ft,  "ft,-  x(n— ft,)  -*n  — ft;  — "«J 

The  matrix  X,  (€  ^(n~ft*)xft»)  can  be  solved  from  the  following  Lyapunov  equation 
[11,12,14,17], 

A{Li  —  LiACt  +  Hj  =  (446) 


The  transformed  system  is 


A  :=  AM(2{)  = 


B  :=  (Mjpr'B  =  [Sj,{Bi  -  LiBi f)T 


•^e,  Oft;  x(n  — fl,) 

0(n-fti)xftt-  Ai 


(45a) 


(456) 


where  B{  =  [Bj ,  Bf]T .  Accumulate  the  transformations  in  A/|^  :=  A/j^A/j**. 

Step  7:  Set  i  :=  t  +  1.  If  i  >  fc  ( fc  is  the  number  of  time- scales),  then  go  to  Step  8;  else,  go 
to  Step  3. 

Step  8:  Compute  the  desired  digital  state-feedback  gain  A'<*  and  forward  gain  Ed  as 


K<t=l(Im+\ZcU)-'Kc{In  +  G) 


(46a) 


The  digital  regulator, 


Ed  =  (Im  +  -ReB)-1Ec 


Ud(kT)  =  - RiZd(hT )  +  EjrfkT) 


(466) 


with  r(fc)  as  any  reference  input,  would  place  the  eigenvalues  of  the  system  in  (39a)  near  the 
hatched  region  of  Fig.  2.  Also,  the  digital  regulator  is  a  suboptimal  discrete-time  regulator 
because  of  the  approximations  involved  in  the  inputs  and  the  various  model  conversions, 
although  the  equivalent  continuous-time  regulator  is  optimal.  Note  that,  although  some 
numerically  stable  algorithms  have  been  suggested  in  the  Appendix  for  computing  some 
special  matrices  and  functions,  the  proposed  multi-stage  design  process  does  not  guarantee 
numerical  stability. 
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5.  Illustrative  Example 

Consider  an  unstable  discrete-time  system  in  (39a)  with 


0.822 

-0-440 

0.008 

0.074 

0.062 

1.244 

0.725 

0.249 

0.000 

0.124 

-0.157 

0.098 

0.752 

-0.371 

-0.014 

0.037 

0.176 

0.284 

0.724 

-0.025 

-0.655 

-0.275 

-0.131 

0.000 

0.136 

0.271 

0.017- 

-0.175 

-0.305 

-0.099 

0.196 

0.068 

0.010 

0.092 

0.213. 

(48a) 


where  <r(G)  =  {0.7252  ±  >0.7466,  0.7537  ±  >0.3190,  0.2013}  and  T  =  0.2. 

The  location  of  the  poles  of  G  in  the  discrete  z-plane  is  shown  in  Fig.  2  and  it  is 
seen  that  except  for  the  one  at  0.2013,  which  is  to  be  kept  invariant,  the  rest  of  the  poles 
lie  outside  the  region  of  interest.  The  objective  is  to  find  the  pseudo-continuous-time 
suboptimal  regulators  for  the  discrete-time  system  in  (48a)  with  pole  assignment  near  the 
specified  region  in  the  z-plane.  The  pseudo-continuous-time  design  procedure  given  in 
Section  4.2  will  be  used  to  achieve  the  desired  design. 

V\'e  utilize  the  matrix  continued  fraction  approximation  in  (17)  with  q  =  4  in  (l£j  to 
obtain  the  equivalent  continuous-time  system  in  (396)  as 


A  = 


B  = 


0.80993 

-2.05956 

0.32673 

0.46503 

0.89827 

6.66468 

0.19703 

1.33276 

0.00065 

0.66065 

-1.29339 

0.45801 

-1.07402 

-2.32838 

-0.20294 

-0.32191 

0.82377 

1.67148 

-1.18838 

-0.36133 

-3.51498 

-4.31738 

-0.70176 

0.00000 

-8.36346 

0.95385 

-0.38170- 

-1.66643 

-1.66699 

-0.21358 

1.19415 

0.61711 

0.05240 

0.87785 

1.40500. 

(486) 


where  a(A )  =  {0.19984  ±  ;3.99969,  -1.00180  ±  j'2.00218,  -8.01498}. 

Since  the  given  system  is  unstable,  the  first  step  is  to  block-decompose  the  continuous 
-time  system  into  its  stable  and  unstable  parts.  Assign  hi  =  0.  The  transformation 
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matrix  found  using  the  matrix  sign  function  technique  given  in  Section  4.1  to  block- 
diagonalize  A,  is  given  by 


=  [5a,  S,]  = 


■ 

/ -0.0469  -0.0550  -0.2096  \ 

/  1 .0469  0.0550  \ 

• 

-0.0001  0.0004  0.0008 

0.0001  0.9996 

0.2331  0.0121  1.0467 

-0.2331  -0.0121 

0.4191  0.0217  0.0839 

-0.4191  -0.0217 

\  0.0012  0.5259  -0.0002/ 

\ -0.0012  -0.5259/ 

m 

(49a) 


where  S3  €  71s  x  3  and  5,  €  ftSx2  can  be  found  from  (37).  The  transformed  matrices  are 
A  :  =  (Jl/1(1))-,iJl/1(1) 


=  block  diag  [Aj,  Aj] 

'  0.00005  -0.00913  4.50053  \ 

-0.01599  -8.01497  -0.00994 
,-1.11375  0.00050  -2.00366  / 


J3x2 


02 


x3 


/  0.19949  — 2.39917\ 

\  6.66793  0.20019  ) 


(496) 


B  :=  (M1(1))'1^  = 


B , 


'  /  2.49670 

—  0.24850^ 

-0.00073 

1.00378 

\ -0.55700 

1.16477  ) 

/  0.99896 

-0.00210  ^ 

\  -1.66645 

—  1.66889  j 

(49c) 


The  eigenspectra  of  the  diagonal  blocks  in  (496)  arc  r(A)  =  {- 1.002  -  ;2.CC2,  -8.015} 

and  cr{A )  =  {0.200  ±  j4.000}.  The  unstable  subsystem  (Ai,i?i), 


A;  = 


0.19949  -2.39917 

6.66793  0.20019 


0.99896  -0.00210 

-1.66645  -1.66889 


(49rf) 


with  <r(v4] )  =  {0.200  ±  j4. 000},  is  to  be  designed  at  this  stage.  Assign  h  =  1.1  (i.e.,  the 
eigenvalues  of  the  closed-loop  system  should  lie  to  the  left  of  the  vertical  line  at  —1.1  on 
the  negative  real  axis  in  the  s-plane)  and  R  =  I2.  To  achieve  the  design,  we  follow  the 
steps  of  the  continuou6-time  design  procedure  in  Section  2.1.  Let  A  =  A\  and  B  =  By. 
Solving  the  Riccati  equation  in  (6)  with  (A  +  hlj.B),  we  have 


Po  = 


2.246  -0.038 

-0.038  0.509 


K0  =  R-'BfPo  = 


2.308  -0.886 

0.059  -0.849 


The  resulting  closed-loop  system  is 

A,  =  A-  BK0  = 


-2.106  -1.516 

10.612  -2.694 


(50a) 


(506) 
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where  cr(Ai)  =  {—2.400  ±j4.000}.  Note  that  |(Re  c-(Aj))|  >  1.1.  Now,  solving  the  Rjccati 
equation  in  (7a)  with  (— A\,B),  we  have 


31.350 

1.258 


1.258 
2.489  ’ 


and  -tT[BR~1£lTQ1}  =  20.48  #  0 


(50c) 


Solving  the  Riccati  equation  in  (76)  with  (Ai,B)  and  Qj,  we  obtain 


'4.078 

0.070' 

,  Ki  =  R~1BTPi  = 

3.957 

-0.473' 

0.070 

0.326 

-0.126 

-0.544 

(50d) 


From  (9),  the  constant  gain  is  chosen  as  71  =  0.6382.  Therefore,  the  closed-loop  system  is 


A2  —  A\  — )\BK\ 


-4.628  -1.215 

14.686  -3.776  ’ 


Q2  =  O2 


(50e) 


where  <t(j42)  =  {—4.2024  ±  j‘4.2024}.  Note  that  all  the  eigenvalues  lie  on  the  boundary  of 
the  hatched  region  in  Fig.  1.  Also,  note  that  tr[(A2  +  h/2)+]  =  0  and  |tr[Ri?_1  BtQi\  =  0, 
where  Qi  is  solved  from  (7a)  with  respect  to  (  —  A\,B).  This  verifies  that  the  design  goal 
has  been  achieved  for  the  subsystem  in  (49d).  Let  us  denote  this  closed-loop  subsystem 
by  ACl  =  A]  —  Bj(K0  4-  7iA'!).  The  continuous-time  feedback  gain  at  this  stage  is 


A]  =  A0  4-  7jR] 


4.833  -1.188 

-0.021  -1.197 


(51a) 


and  it  is  optimal  with  respect  to  the  performance  index  in  (2)  having  R  =  /2  and 


Qi  =  2hPo  4-  71  [<?]  4-  (7]  —  l)P\Bi  R  1  Bf  P^} 

_  [21.332  1.135' 

1.135  2.588 


Using  this  feedback  gain,  the  updated  system  is  given  by 

A  :  =  A  —  B[02x3,Ri] 

=  ^  Tfi  ' 

02x3  Ac  1 


0.0000 

-0.0091 

4.5005  \ 

/ -12.0720 

2.6690  \ 

0.0160 

-8.0150 

-0.0099 

0.0251 

1.2002 

1.1137 

0.0005 

-2.0037  J 

^  2.7169 

0.7320  ) 

02x3 

( 

■  -4.6284 

—  1.2149  > 

\ 

v  14.6859 

—3.7764  j 

)J 

(516) 


(51c) 
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The  updated  feedback  gain  Kc  and  weighting  matrix  Q  are 


K  —  ft  .in  unfOK-i  _  4’833  -1  188  0.969  0.00c  0.484 

C-  C  +  [  2x3,AiJ(AJj  )  —0.021  -1.137  -0.003  -0.001  -0.001 

Q  :  =  Q+m(i1)ri]T[ block  diaglOa^p/j1*)-1 


(51  cf) 


21.332 

1.135 

4.271 

0.006 

2.131 

1.135 

2.588 

0.225 

0.002 

0.112 

4.271 

0.225 

0.855 

0.001 

0.427 

0.006 

0.002 

0.001 

0.000 

0.001 

2.131 

1.112 

0.427 

0.001 

0.213 

(51c) 


The  solution  of  the  Lyapunov  equation  in  (446)  for  i  =  1  and  hi  =  2  is 


0.998  -0.891 
h  =  -0.552  0.129 

-1.252  -0.114 


(51/) 


and  thus  the  transformation  matrix  M j1*  that  block-diagonalizes  A  in  (51c)  and  swaps  the 
blocks  Aci  and  A i  is  given  by 


U  h 

h  02x3 


The  transformed  matrices  are  now  given  by 

A  :=  =  L^cl  °Y3 

[03x2  A ] 


(51p) 


(52o) 


B  :=  = 


0.9990  -0.0021  \ 

-1.6665  -1.6689 ) 

0.0153  -1.7332 
0.7648  1.2174 

0.5031  0.9711 


(526) 


The  accumulated  transformation  matrix  becomes  M[ 1  ^  :=  ■ 

Now,  we  proceed  to  the  second  stage  of  design.  Choose  /12  =  1.1.  The  transformation 
matrix  M j  which  block-diagonalizes  the  block  .4]  in  (51c)  while  preserving  the  block  ,4ci 
is  given  by  (as  in  (406)) 

,,(2)  _  h  02 x3 

-  n.  _  ic.  \  (52c) 


where 


[52,5!]  = 


(2)  _ 

1  — 

h 

03x2 

02X3 

(  52  5,) 

'0.0011 

\  / 

(  1.0000 

0.0000 

1.0000 

-0.0020 

-0.0001 

0.0001 

)  ' 

1  0.0000 

1.0000 

(52  d) 
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The  submatrices  S2  €  Tl3xi  and  5i  €  H3x2  can  be  found  from  (37)  with  respect  to  A j 
and  h2.  The  transformed  matrices  A  and  B  are 

A  :  =  (mJ2))-1AM1(2) 


=  block  diag  [Aei,A2,.A2] 

•f  -4.6284  -1.2149  \ 
\  14.6859  -3.7764 ) 

—  Oj  X2 


02xi 

(-8.0150) 

02xi 


(53a) 


f  0jx2 

/  0.0001  4.5005' 

\ -1.1137  -2.0040 

(  0.9990  —0.0021  \ 

V -1.6665  —1.6689  ) 


B  :=  (Jll}a))'IB  =  I  B2  I  =  (  0.7649  1.2140) 

[B2\  (  0.0145  -1.7346  \  I 

L  \  0.5030  0.9709 )  J 

Again,  the  accumulated  transformation  matrix  becomes  :=  m[2K  The  subsys¬ 
tem  to  be  designed  at  this  stage  is  (A2,B2).  Following  the  same  procedures  as  in  the  first 
stage,  we  obtain  the  designed  continuous-time  subsystem  as 


(536) 


Ac  2  = 


-0.1156  7.0817 

-1.2639  -4.2294 


(53c) 


with  cr(Ac2)  =  {-2.1725  ±  j2. 1725).  Note  that  these  eigenvalues  are  within  the  hatched 
region  of  Fig.  1,  The  continuous-time  feedback  gain  K2  and  weighting  matrix  Q2  at  this 
stage  are 


*2  = 


Q  2  — 


0.4205  1.5278 
-0.0632  1.5008 

12.0109  3.3163' 
3.3163  9.2675 


(53d) 


(53e) 


The  updated  feedback  gain  Kc  and  weighting  matrix  Q  are  given  below: 


=  1 

+  (02x3) 

-1 

'6.74659 

-0.63902 

2.79527 

0.20152 

0.63200 

1.85769 

-1.08109 

1.88639 

-0.99297 

0.19219 

■35.87594 

6.16143 

17.95729 

3.93400 

3.16016- 

6.16143 

4.97772 

4.80710 

3.19066 

0.39323 

= 

17.95729 

4.80710 

13.76777 

3.28138 

1.41166 

3.93400 

3.19066 

3.28138 

6.20283 

0.07071 

.  3.16016 

3.93227 

1.41166 

0.07071 

0.29412. 

(53/) 


(53  g) 
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where  tr(Q)  =  {48.298,  7.506,  3.368,  0.000,  1.947}  and  Q  >  0. 

The  eigenvalues  of  A  —  £[02x3,  Aa]  with  A  and  B  as  in  (53a)  and  (536)  re  {—4. 2024  ± 
j'4.2024,  —2. 1725  ±j2. 1725,  —8.0150}.  Note  that  all  of  them  are  within  tht  hatched  region 
of  Fig.  1,  and  the  non-dominant  eigenvalue  of  the  open-loop  system  at  —8.0150  is  keep 
invariant.  Therefore,  the  closed-loop  continuous-time  system  is 

Ae  =  A  -  BKe 

‘  -4.91622 
21.00416 
=  -2.07080 

-4.58264 
.-12.04750 

The  continuous-time  optimal  regulator  is  given  by 

uc(t)  =  - Kexe(t )  +  Eer(t)  (546) 


-1.86269  -1.61950  -0.10620  0.36880' 

-2.67002  9.13548  -1.31880  2.03420 

1.61251  -2.72963  -1.09959  -0.29746 

1.27476  -0.15235  -1.26071  -0.76142 

-2.23749  -5.80596  1.21822  -9.18828. 


where  Kc  is  the  total  feedback  gain  as  in  (53/),  Ec  =  I2,  and  r(t)  is  any  reference  input. 

The  digital  redesigned  closed-loop  system  will  be  of  the  form  as  shown  in  (22)  with 
the  digital  state-feedback  gain  Kd  and  forward  gain  Ed  in  (47)  to  be  determined.  With  G 
and  H  as  in  (48a),  Ec  =  /2,  and  Kc  as  in  (53/).  the  gains  Kd  in  (34a)  and  Ed  in  (346)  can 
be  evaluated  as  follows: 

Ki  =  \{h  +  \KcH)-'Re(h  +  G) 

2.82346  -0.82731  1.05826  0.17590  0.26367]  (55a^ 

-  [0.10625  -0.85000  0.81854  -0.85244  0.02810 


Ed  =  {h+l-KcH)-'Ec  = 


0.56005  -0.20157 
-0.09247  0.75738 


(556) 


The  designed  closed-loop  digital  system  matrix  in  (22)  with  the  gains  in  (55)  is 


G  =  G  -  HRd 


0.05504 

-0.20135 

-0.29270 

0.04082 

-0.00993 

1.77051 

0.32097 

0.68385 

-0.22921 

0.17871 

0.10170 

0.18270 

0.69633 

-0.18651 

0.00660 

-0.15606 

0.24076 

0.20385 

0.72056 

-0.04321 

-0.93739 

-0.01784 

-0.40271 

0.16539 

0.10576 
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where  <r (G)  =  {0.2582  ±/0.4070,  0.5905  ±/0. 2718,  0.2013}.  These  eigenvalues  close  to  the 
digitized  continuous-time  optimal  eigenvalues,  {0.2879±/0.3215,  0.5874±,;0.2726,  0.2013}, 
of  the  system  matrix  A  —  BRe. 

The  simulations  of  the  closed-loop  systems  in  (5)  and  (22)  with  r(<)  as  a  unit-step 
vector  are  shown  in  Fig.  3.  It  can  be  seen  that  all  the  discrete  states  Xd(kT)  closely  match 
the  continuous-time  states  xc(t )  at  t  =  kT.  Also,  the  simulations  of  the  continuous-time 
quadratic  regulator  in  (3)  with  Ec  =  Ii  and  Kc  as  in  (53/)  and  the  discrete-time  control 
law  in  (47)  with  Rj  and  Ej  as  in  (55)  are  shown  in  Fig.  5.  The  continuous  function 
ue(t)  in  (3)  closely  matches  the  discrete  function  Ud(kT)  in  (47).  The  same  simulation 
results  have  been  obtained  by  using  the  approximated  digital  gains  K&  in  (3 4c)  and  Ed 
in  (34 d)  The  simulations  were  also  carried  out  with  the  approximated  digital  feedback 
gain  A'd  and  forward  gain  Ed.  as  given  in  (36)  and  shown  in  Fig.  4.  For  this  case,  a 
rather  large  discrepancy  occurs  in  the  transient  region  due  to  the  utilization  of  the  roughly 
approximated  digital  gains  in  (36).  It  might  be  interesting  to  note  that  the  direct  use  of 
the  digitized  uc(t)  in  (545)  with  Kc  in  (53/)  and  Ec  =  to  the  system  in  (48 a)  results  in 
an  unstable  response. 

Since  all  the  designed  digital  states  closely  match  the  continuous-time  optimal  states 
and  the  designed  digital  regulator  closely  matches  the  continuous-time  quadratic  regulator, 
the  designed  digital  regulator  in  (47)  with  Kd  and  Ed  as  in  (55)  can  be  termed  as  a  pseudo- 
continuous-time  quadratic  regulator. 

6.  Conclusion 

The  design  of  large-scale  discrete-time  systems,  which  do  not  exhibit  a  two-  or  multi¬ 
time  scale  structure  explicitly,  has  been  considered  in  this  paper.  It  has  been  shown  that 
a  large-scale  pseudo-continuous-time  system  can  be  decomposed  into  a  completely  decou¬ 
pled  multi-time  scale  structure  (block-diagonalization)  using  the  techniques  based  on  the 
matrix  sign  function,  without  explicitly  utilizing  the  open-loop  eigenvalues  of  the  given  sys¬ 
tem.  A  pseudo-continuous-time  state-space  method,  based  on  model  conversions,  has  been 
developed  for  methodically  designing  each  subsystem  (corresponding  to  one  time-scale), 
with  eigenvalue  placement  near  a  desired  region  of  the  complex  z-plane.  The  model  con¬ 
versions  and  various  other  computations  can  be  achieved  using  fast  and  stable  algorithms 
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based  on  the  principal  gth  root  of  the  system  matrix  and  the  matrix  sign  functions.  A 
new  digital  redesign  technique  based  on  matching  all  the  states  at  all  the  sampling  instants 
has  been  developed  for  finding  the  pseudo-continuous-time  regulator  with  appropriate  pole 
assignment.  With  an  appropriately  sampling  period  T,  the  designed  discrete  controller  is 
suboptimal  while  its  associated  continuous-time  controller  is  optimal  with  respect  to  cer¬ 
tain  weighting  matrices.  The  proposed  method  requires  the  solution  of  Riccati  equations 
of  small  order  only  at  each  stage  of  the  design.  Transformation  to  general  canonical  form 
so  as  to  determine  the  discrete  feedback  gain  can  be  avoided.  The  developed  state-space 
method  can  be  used  to  design  multivariable  digital  control  systems,  for  determining  the 
state-feedback  pole-placement  controllers;  whereas,  the  existing  pseudo-continuous-time 
frequency-domain  method  [6]  can  only  be  applied  to  design  single  variable  digital  control 
systems  for  obtaining  the  cascaded  controllers. 
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Appendix 

A-l  Principal  nth  root  of  a  matrix  [21,24] 

Definition  A.l:  Let  a  matrix  A  €  Cmxm  have  an  eigenspectrum  cr(A)  =  {Aj,t  = 
l,...,m},  Aj  7^  0  and  arg  (A,)  ^  v.  Then,  the  principal  nth  root  of  A  is  defined  as 
^4€C—  ,  where  n  is  a  positive  integer,  and 

(a)  (v/Ir  =  A 

(b)  arg^cr(v//A)j  6  (-rr/n, +7r/n). 

A  generalized  fast  and  stable  algorithm  with  fcth  order  convergence  has  been  derived 
in  [21,24]  for  computing  the  principal  nth  root  of  a  given  complex  matrix  A  €  Cmxm.  The 
algorithm  corresponding  to  quadratic  convergence  ( k  =  2)  is  listed  below. 

G(k  +  1)  =  G(k )  ( [2 Im  +  (n  -  2)G(fc)]  [lm  4-  (n  -  l)G(k)}  _1)  , 


G(0)  =  A. 

lim  (?(fr)  =  7m 

(Al.cz) 

oc 

Ait  t  i  i  =  Ji\  K  1 ,2xm  t  n 

—  2)G(l-)j  _1  [7m  +  (n  —  1  )<?(*)] . 

R(  o)  =  7m, 

lim  R{k)  =  \/a 

Ar— *oo 

(A.l.fc) 

A*2  Matrix  sign  function  [16] 

The  matrix  sign  function  of  a  matrix  A  6  Cmxm  [16,18,21]  is  defined  as 

sign  (A)  =  A(\/A2)-1  =  A~1(Va*)  (A.2) 

where  the  matrix  v/A2  denotes  the  principal  square  root  of  A2.  A  fast  and  stable  algorithm 
[21]  to  compute  the  matrix  sign  function  is  listed  below.  For  k  =  0,1...., 

Pj(k)  =  P, •_,(*)  +  S-’WQ^ik),  P,(fc)  =  7m,  and 

Qj(k)  =  pj-i(h)  +  =  In,  with  j  =  2, . . .  ,r  (A. 3. a) 

S(k  +  1)  =  S(k)Q~t{k)Pr(k),  5(0)  =  A.  lim  S{k)  =  sign  (A)  (A.3.6) 

k  —  oo 

where  r  is  the  order  of  the  desired  rate  of  convergence. 
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A-3  Solving  Riccati  equation  via  matrix  sign  function 

The  RiC'  ati  equation  for  the  controllable  continuous-time  system  (A,  2?)  with  weight¬ 
ing  matrices  Q(>  0)  and  R(>  0)  is  given  by 

PBR~1BtP-AtP-PA-Q  =  0  (AA.a) 

The  steady  state  solution  of  this  Riccati  equation,  P(>  0)  with  (Q,A)  detectable,  can  be 
easily  computed  using  the  properties  of  the  matrix  sign  function  [4,17],  and  the  eignvalue- 
eignvector  approach  [7].  Consider  the  Hamiltonian  associated  with  the  given  system 

A  -BR~'Bt 


H  ~  1 -Q  -At 
The  following  algorithm  can  be  utilized  to  obtain  the  solution  P. 


(. AA.b ) 


1 


Hk+1  =  ^[Hk  +  H; Ho  =  H, 
lim  Hk  =  sign (H) 

k  — •  oc 


and 


Let 


sign4  (H)  =  ^[/2„  +  sign (H)} 


Construct  a  block  modal  matrix  A'  as 


A*  =  find  (sign4  (i/)),  ind  (/2n  -  sign^  (H))}  = 


A'„  Xl2 


(A.5.a) 

(A.5.6) 

(A. 6. a) 


[A2j  A22 

where  ind(-)  represents  the  collection  of  the  linearly  independent  column  vectors  of  (•). 
Then,  we  have 

p  =  A'22(A'12)-3  (-4.6.6) 

To  alleviate  the  problems  of  computing  the  Hamiltonian  can  be  transformed  into  a 

symmetric  form  as  follows  [4] 


H  =  JH  = 


Or 


-In 


In  0r 

Then,  the  algorithm  in  (A. 5)  becomes 

1 


H  = 


Q  At 
A  -BR~'Bt 


(A.  7. a) 


Hk+1  =  ^[Hk  +  jH;'J}.  H0  =  JH, 


and 


lim  ( — !Hk)  -  sign(tf) 

k  —  oo 


( A.T.b ) 

The  computation  of  the  inverse  of  the  symmetric  matrix  Hk  is  much  simpler  than  com¬ 
puting  the  inverse  of  Hk.  The  Riccati  solution  P  is  again  given  by  (A. 6). 


233 


The  region  of  interest  in  the  continuous-time  s-plane. 


interest  in  the  discrete-time  2-plane. 


States 


States 


Fig. 4.  Comparison  of  the  state  trajectories  of  A'e(£)  and  Xj(t)  with  Kj  and  Ed  in 
equ.  (27). 
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Fig-5.  Control  signals  uc{t)  in  equ.  (546)  and  ud{kT )  in  equ.  (47). 
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NEW  METHODOLOGIES  IN  RENEWAL  THEORY 


B.  D.  Sivazlian 
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ABSTRACT 

The  derivation  of  the  probability  law  (joint  distribution  function) 
of  a  renewal  counting  process,  and  the  analysis  of  the  filtered  renewal 
process,  form  the  theoretical  basis  1)  to  study  the  prediction  problem 
associated  with  systems  regulated  by  these  processes,  and  2)  to 
formulate  and  solve  a  number  of  applied  problems  arising  in  reliability, 
replacement,  maintenance,  queueing,  production  and  other  pertinent  areas 
of  interest  to  engineering,  operations  research  and  military  systems. 

The  research  addresses  itself  to  both  theory  and  applications. 


I .  INTRODUCTION 

As  an  important  branch  of  stochastic  processes,  renewal  theory  has 
found  many  useful  applications  in  statistics  and  in  the  mathematical 
modeling  of  natural  and  man-created  phenomena,  particularly  in  solving 
complex  problems  in  operations  research  such  as  inventory,  queueing, 
reliability  and  replacement.  As  a  process  generalizing  the  Poisson 
process  and  all  its  ramifications,  renewal  theory  has  found  applications 
in  such  fields  as  actuarial  sciences,  astronomy,  astrophysics,  ecology, 
economics,  engineering,  meteorology  and  physics. 
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The  literature  on  renewal  process  is  at  least  50  years  old. 

Lotka's  paper  (1939)  on  "...  self-renewing  aggregates  ..."  contains  a 
list  of  74  papers  on  the  subject  of  renewal  equation  and  its 
applications  dating  as  far  back  as  1909  when  Herbelot  encountered  the 
equation  while  investigating  an  actuarial  problem.  In  his  fundamental 
work,  Feller  (1941)  is  the  first  to  study  formally  the  integral  equation 
of  renewal  theory.  Smith  (1958)  provides  a  thorough  review  of  renewal 
theory.  Cox  (1962)  discusses  many  theoretical  and  applied  problems  in 
the  area.  Daley  and  Vere-Jones  (1988)  and  Volff  (1988)  present  a  more 
modem  approach  to  the  theory.  Without  elaborating  further  on  the 
existing  literature,  it  suffices  to  say  that  renewal  theory  is  a 
standard  topic  covered  by  most  textbooks  on  stochastic  processes  and 
their  applications. 

Renewal  theory  has  found  a  very  fruitful  area  of  application  in 
modeling  complex  systems  in  reliability,  maintainability  and 
availability.  Renewal  theory  has  been  used  for  example  (see  Barlow  and 
Proschan,  1965) :  1)  to  define  the  operating  characteristics  of 

maintenance  policies,  2)  to  solve  the  age  and  block  replacement  problem, 
3)  to  formulate  repair  problems  of  single-  and  multi-units,  and  4)  to 
derive  optimum  inspection  and  maintenance  polices.  The  theory  has  also 
been  applied  to  solve  problems  in  reliability  arising  from  shock 
processes,  cumulative  damages  and  redundancies. 

Other  areas  of  applications  in  operations  research  where  renewal 
theory  has  been  utilized  have  been:  single  and  multi-commodity  inventory 
systems,  queueing  systems,  maintenance  and  replacement  systems.  More 
recently,  in  using  diffusion  approximation  to  solve  complex  queueing 
systems,  such  as  the  machine  repair  problem  with  standbys,  renewal 
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theory  has  been  used  to  generate  the  infinetesimal  means  and  variances 
to  the  diffusion  equation. 

This  research  proposes  the  development  of  a  unifying  methodology  to 
bring  forth  a  new  perspective  to  the  analysis  of  renewal  processes.  The 
proposed  research  aims  at:  1)  obtaining  new  results  in  the  field  such  as 
the  characteristics  of  the  probability  law  of  a  renewal  counting 
process;  2)  studying  the  theory  of  filtered  renewal  process;  3) 
developing  efficient  procedures  to  predict  the  behavior  of  systems 
governed  by  renewal  processes  for  short-term  and  medium-term  purposes, 
and  4)  using  the  results  obtained  in  a  variety  of  applied  problems 
arising  in  engineering,  operations  research  and  military  systems. 


II.  THE  THEORY  OF  RENEWAL  PROCESSES 

1.  The  Probability  Law  of  a  Renewal  Counting  Process 

The  study  of  a  stochastic  process  is  not  complete  until  one  has 
characterized  its  probability  law,  that  is,  in  our  case,  the  joint 
distribution  function  of  the  number  of  renewals  at  distinct  time  epochs 
t^,  t2>  .  .  .  ,  tn,  where  0  <  tj_  <  t2  <  ...  <  tn  (n  being  an  arbitrary 
positive  integer) .  The  probability  law  provides  all  the  necessary 
information  to  describe  the  properties  of  the  process.  Unfortunately, 
despite  its  early  inception  and  its  many  usage,  the  probability  law  of  a 
renewal  counting  process  has  been  considered  so  far  to  be  too  difficult 
a  task  to  tackle,  and  thus  remains  an  unsolved  problem.  Only  in  the 
special  case  of  the  Poisson  process  defined  as  a  renewal  process,  has 
this  law  been  derived,  leading  to  important  characterization  of  the 
process  itself.  For  example,  it  may  be  shown  that  the  Poisson  process 
has  stationary  independent  increments ,  two  properties  that  form  the 
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basis  for  many  important  applications  in  statistics  and  operations 
research  (see  e.g.  Cohen,  1982). 

Let  {T^J,  i  -  1,2 . be  the  sequence  of  interarrival  times  in  an 

ordinary  renewal  process,  assumed  to  be  i.i.d.  random  variables  with 
probability  density  function  f(x) ,  0  <  x  <  «,  and  distribution  function 
F(x).  Let  {N(t),t  2:  0)  be  the  total  number  of  renewals  in  [0,t]  where 
N(0)-0.  Consider  distinct  time  epochs  t^,t2« . . . ,tm,  where  0  <  t^  <  t2  < 
. . . ,tm  and  m  is  any  positive  integer. 

The  probability  law  of  the  renewal  counting  process  may  be  defined 
by 

PdKtj.J-nj.,  N(t2)-n2 . (1) 

where  0  i  n^  5  n2  :£  ...  3  t^.  Other  representations  of  the  probability 
law  of  the  process,  such  as  the  joint  characteristic  function,  may  be 
appropriate  depending  on  the  nature  of  the  intended  results. 

We  propose  a  new  methodology  to  formulate  mathematically  the 
probability  law  of  a  renewal  counting  process  and  to  obtain  closed  form 
expressions  in  terms  of  the  distribution  of  the  interarrival  times. 

This  new  methodology  is  based  on  the  properties  of  certain  classes  of 
multiple  integrals.  In  addition,  basic  properties  of  the  process  can  be 
identified  and  structured  such  as  the  distribution  of  renewal  increments 
N(t2)-N(t^),  the  joint  distribution  of  the  number  of  renewals  N(t)  and 
the  forward  recurrence  time  V(t),  etc.  The  results  obtained  will  be  of 
particular  use  in  developing  mathematical  models  for  prediction. 
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2.  Methodology 


3-l _ Introduction 

The  objective  of  the  present  section  is  to  show  that  a  class  of 
multiple  integrals  may  be  used  as  a  novel  mathematical  methodology  to 
solve  problems  arising  in  renewal  counting  processes.  Multiple 
integrals  provide  a  natural  vehicle  to  approach  these  complex  problems 
as  one  is  essentially  dealing  with  sums  of  independent  random  variables 
in  the  context  of  interarrival  times.  The  class  of  multiple  integrals 
typically  arising  in  these  problems  is  of  the  generalized  Liouville  type 
(Sivazlian,  1971).  For  example,  we  use  this  methodology  to  provide  a 
new  derivation  to  the  distribution  of  the  number  of  renewals  in  an 
ordinary  renewal  process.  The  primary  emphasis  is  to  demonstrate  the 
use  of  multiple  integrals  as  a  method  of  analysis  and  solution,  rather 
than  to  derive  the  specific  intended  result  in  the  shortest  number  of 
steps.  The  available  method  for  obtaining  this  result  in  the  existing 
literature  is  much  shorter;  it  relies  however,  on  event  arguments 
relating  waiting  times  to  number  of  renewals,  which  are  restrictive  (see 
e.g.  Cox,  1962).  The  present  derivation  is  based  on  the  joint 
distribution  of  the  number  of  renewals  and  the  interarrival  times. 
Moreover,  the  intent  is  to  suggest  a  methodology  which  could  be  used  to 
solve  more  complex  problems  in  renewal  theory  such  as  1)  deriving  the 
probability  law  of  a  renewal  counting  process  2)  characterizing  the 
statistical  properties  of  the  filtered  renewal  process  and  3)  providing 
a  basis  for  predicting  systems  behavior  which  are  of  the  renewal  type. 

We  first  state  a  result  in  multiple  integrals. 


243 


Ll _ A  Result  in  Multiple  Integral 

Theorem 

Define  for  x  >  0,  the  function  g(x)€  ‘tf  (i.e.  continuous)  and  the 
function  ^(x)e  5C(i.e.  with  at  most  a  finite  number  of  points  of 
discontinuity  in  every  finite  interval  and  such  that  the  integral 

•x 

|<^(u)|du  has  a  finite  value  for  every  x  >  0)  ,  i— 1,2 .  n  where  n 

•  0 

is  a  positive  integer  (Mikusinski,  1959).  Then 


g(xx+x2  + 


R 


+  ^)  ^1(x1)^2(x2)  •••  *„(*„)  <**2  ‘k'n 


■t 

gCuM^u)*^)  *  •••  V„(u)j  du  (2) 

«  0 


where  R  -  {x:  0  <  Xj+x2  +  •••  +  xnSt,  x^>0)  and  where  the  integrand 
on  the  right  hand  single  integral  is  a  function  of  class  X.  Here  the 
notation  *  refers  to  the  usual  convolution  operation.  (For  a  proof  see 
Sivazlian,  1971.) 

SL. _ The  Distribution  of  the  Number  of  Renewals 

Let  (T^),  i— 1,2,...,  be  the  sequence  of  interarrival  times  in  an 
ordinary,  renewal  process,  assumed  to  be  independently  and  identically 
distributed  random  variables  with  probability  density  function  f(x), 

0  <  x  <  ®,  and  distribution  function  F(x) .  Let  {N(t),t>0}  be  the  total 
number  of  renewals  in  [0,t]  where  N(0)-0.  The  joint  distribution  of 

N(t)  and  T^,  T2 . T^^  is: 

a.  For  N ( t ) -0 ,  t  >  0: 
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P{N(t)-0,  Tl  >  t}  -  1-  F(C) 


(3) 


b.  For  N(t)-n  >1,  0  <  x^+X2  +  ••••  +  ^  S  t: 

P{N(t)-n,  xx  <  Tx  ^  x^+dx^,  X2  <  T2  :S  X2+dx2> 

*11  <  Tn  *  *n+dxn>  Tn+1  >  c-<xl+x2  +  +  V  > 

-  f(x1)-f(x2)  •••  f(x^)  {1-  F[ t- (x^+x2  +  •••  +  xn>]  dxL  dx2  •••  dxn 

(4) 

Thus,  the  probability  mass  function  of  N(t)  is: 

For  N(t)-0: 

P{N(t)— 0}— l-F(t) 

For  N(t)-n  >  1: 


PlN(t)-n)- 


f(x1)-f(x2)  •••  fC^) 


0<x^+X2+- • 

{1  -  F[ t- (x^+x2  +  • • •  +  xn) ] )  dx^  dx2  • ■ •  dxn 


•t 

f*<n>(u)  { l-F(t-u) ]  du 

Jo 

-  F(n)(t)  -  F(n+1)(t)  (5) 


which  is  a  well-known  result. 

d.  The  Joint  Distribution  Function  of  the  Number  of  Renewals 
Consider  two  time  epochs  t^  and  t,  where  0  <  t^  <  t,  and  suppose 
that  it  is  required  to  determine  P{N(t^)-n^ ,  N(t)-n).  Here,  it  is 
necessary  to  consider  severs1  cases  depending  on  the  values  taken  by  n-^ 
and  n  (0  <  n^  <  n) . 
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For  example,  if  we  consider  Che  case  n-^  >  1,  n  >  +  2 ,  Chen 

P(N(t)-n^,  N(t)-n)  will  be  given  by: 

P{N(c)-n1,  N(C)-n) 

■ 

-  •••  f(xx)  f(x2)  f(Xn) 

J  J  « 

R 

{1-  F[C-(x1+x2  +  •••  +  x^]}  dx*  dX2  •••  ^  (6) 

where  R  -  {x  :  0  <  x^+x2  +  • • •  +  xn  <  c^  <  X|+x2  +  • • •  +  xn 

<  x1+x2  +  •••  +  xn  <  C).  (7) 

Clearly  che  problem  reduces  Co  one  involving  Che  evaluacion  of  an  n- 
Cuple  incegral.  The  developmenC  of  an  appropriace  mechodology  to  reduce 
this  multiple  integral  to  a  simpler  expression  which  is  more  amenable  to 
analysis  and  which  can  be  more  useful  in  characterizing  a  renewal 
counting  process  appears  in  Sivazlian  (1989). 

e.  The  Prediction  Problem  in  Renewal  Processes 
Traditionally,  research  in  renewal  theory  has  delved  into  either 
finding  solutions  to  the  time -dependent  problem,  or  the  derivation  of 
limit  theorems.  The  letter  has  constituted  the  majority  of  work  in  the 
more  recent  years.  This  is  quite  understandable  since  often  the 
solution  to  a  time -dependent  problem  is  not  easily  obtainable.  Although 
limit  theorems  may  sometimes  be  used  to  arrive  at  desired  solutions, 
nevertheless  they  are  often  inadequate  for  solving  certain  problems 
particularly  for  short  term  and  medium  term  predictive  purposes.  It  is 
evident  that  for  this  particular  area  of  research  many  of  the  answers 
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would  be  obtained  directly  from  the  formulas  expressing  the  probability 
law  of  theprocess,  which  provide  the  solution  to  the  time -dependent 
problem. 

Consider  a  system  whose  behavior  is  regulated  by  a  renewal  counting 
process.  Let  N(t)  be  the  number  of  counts  up  to  time  t.  Consider  time 
epochs  0  <  t^  <  t2  <  t  ,  and  suppose  that  the  system  has  been  operating 
till  time  t^.  For  predictive  purposes,  one  may  be  interested  in 
computing  several  probability  expressions  given  that  some  type  of 
information  is  available  about  the  process  at  a  given  time  t^.  These 
probability  expressions  provide  mathematical  models  for  predicting  the 
statistical  behavior  of  the  process  at  some  future  time  t2  >  t^,  given 
some  level  of  knowledge  concerning  the  state  of  the  process  at  time  t^. 
Depending  on  the  circumstances,  we  consider  three  cases: 

Case  1:  At  time  t-^,  only  the  number  of  renewal  counts  N(t^)  -  n-^  is 
known; 

Case  2:  At  time  t-^  the  time  at  which  the  last  renewal  count  has 
occurred,  is  known; 

Case  3 :  At  time  t^ ,  no  information  is  available 

Case  1: 

When  at  time  t^  the  number  of  renewal  counts  N(t^)-n^  is  known,  the 
expressions  of  interest  for  predictive  purpose  would  be  the  conditional 
probabilities : 


P(N(t)-n  1  N(t1)-n1 ) 

(8) 

P(N(t+dt)-n  |  N(t1)-n1) 

(9) 
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P{N(t2)-n2,  N(C)-n  j  N^)-^} 


(10) 


P{N(t)-N(t1)-m  |  N(t1)-n1» 


(ID 


Note  in  particular  that  an  expression  for  (9)  would  yield 
transition  rates  for  the  renewal  counting  process.  Similarly,  one  may 
be  interested  in  obtaining  the  covariance  function  Cov  [N(t^),  N(t2)], 
the  conditional  expectation  E[N(t) -N(t^)-m  |N(t^)-n^],  the  joint 
distribution  function  of  the  backward  and  forward  recurrence  times 
conditional  on  N(t^)-n^,  etc. 


Case  2: 

Suppose  that  at  time  ti  ,  it  is  known  that  the  last  renewal  count 
has  occurred  at  time  r,  0  <  r  <  t^.  Let  be  the  random  variable 
defining  the  time  elapsed  from  t-^  until  the  next  renewal  occurs.  It  is 
evident  that  the  probability  density  function  of  U^,  gy  (u)  is 


gy  (u)  -  f(u) 
1 


f(u)  du,  t^-r  <  u  <  « 

J  t  -r 
1 


(12) 


Consider  now  the  modified  (or  delayed)  renewal  counting  process  in 

which  the  arrival  time  till  the  first  renewal  is  U-^,  while  the 

interarrival  times  for  the  next  renewals  is  still  given  by  the  sequence 

of  i.i.d.  random  variables  (T^),  i  -  2,  3,...  One  thus  is  lead  to  study 

the  probability  law  of  a  modified  renewal  process.  Clearly,  the  same 

methodology  used  in  Section  II-l  would  still  be  applicable  here  except 

that  instead  of  using  f y  ( ' )  in  the  final  expressions  obtained,  one 

1 

would  substitute  gy  (•)•  Time  t^  would  then  be  considered  as  the  origin 

1 
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of  che  process.  Expressions  for  Che  condicional  probabilicy. 

P{N<C2-t^)-n^,  N(t-t^)-n  |  lasc  renewal  occurred  ac  r),  (13) 

0<r<c1<c2<t 

or  ocher  probabilicies  could  Chen  be  scraighcforwardly  obcained. 

Case  3 : 

In  Chis  case,  no  information  is  available  abouc  che  sCaCe  of  che 
process  aC  Cime  t^.  This  case  is  clearly  of  Che  same  Cype  as  Case  2 
excepc  ChaC  Che  Cime  Cill  Che  firsC  renewal  is  Che  forward  recurrence 
Cime  V(CL) .  LeC 

j  -  discribucion  funecion  of  V(c^); 

M(y)  -  the  renewal  funecion  for  Che  original  ordinary  renewal 
counCing  process; 


Then  ^  is  given  by  Che  expression: 


HV(c  )  (x)  “  F(cl+X)  -  “ 


[F(C1+x-y)  -  F(t1-y)]  dM(y)  (14) 


The  mechod  of  analysis  would  be  similar  Co  Case  2. 


3.  A  Birch  Equation  for  the  Ordinary  Renewal  CounCing  Process 

Transition  races  for  the  renewal  counting  process  (N(t),  t  >  0)  may 
be  derived  (Sivazlian,  1989).  As  a  result,  che  process  can  be  "viewed" 
as  a  non-homogeneous  state-dependent  birth  type  process.  Thus,  although 
(N(t),  t  >  0}  does  not  in  general,  satisfy  the  Chapman-Kolmogorov 
equations,  che  unconditional  distribution  of  N(t)  satisfies  che  well 
known  differencial  -  difference  equations  of  the  birch  type.  It  is 
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shown  Chat  indeed  Che  solucion  of  chese  equations  yield  Che  well-known 
resulcs 


P{N(t)-0)  -  1-  F(t) 

P{N(t)-n)  -  F(n)(t)  -  F(n+1)(t). 
Here  we  define 

pt 


F(n)(t)  - 


f(x)  F(n'1)  (t-x)  dx  ,  n  -  1,  2,  . . . 


pt 


with  F(C)  -  F(0)(t)  - 


f(x)  dx. 


The  resulcs  may  be  summarized  as  follows: 

£*(n+l)  ^ 

P{N(C1+dt)-n+l|N(t1)-n)  -  — — - - — -  dt  +  o  (dt) 

F(n)(t1)  -  F(n+1)(t1) 

n-0,1,2,... 
f*(n+l)(t  x 

P(N(t1+dt)-n|N(C1)-n}  -  1-  - - — -r -  dt  +  o  (dt) 

F<n>(tl)  -  F<n+1>(tl) 

n-0,1,2,... 

P(N(t1+dt)-n|N(t-L)-n1)  -  o(dt)  n  >  n^  +  2 


(15) 


(16) 


(17) 


(18) 

(19) 

(20) 


It  is  Chen  evident  Chat  we  can  write  for  0  <  m  <  n,  expressions 

for: 

P(N(t+dt)-n)  -  2  P ( N ( t+dt ) -n ,  N(t)-m} 

m 

-  2  P{N(t+dt)-n  |  N(t)-m)  P{N(t)-m)  (21) 

m 

Let  P(n,t)  -  P(Nvt)  -  n) .  We  find 
dP(0,t)  f(t) 


P(0,t) 


dt  1-  F(t) 

with  the  initial  condition 


P(0,0)-1 


(22) 


(23) 
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Also 

dP(n,t) 

dc 


F*(n+1) 


(t) 


F*(n) 


F^  (t) 


P(n.t)  - 


(t) 


(t) 


F(n-l)(t).F(n)(t) 


P(n-l,t)  (24) 


with  the  initial  condition 

P(n,0)-0 


(25) 


The  solution  of  these  equations  may  be  verified  to  yield  (15) . 


III.  THE  THEORY  OF  FILTERED  RENEWAL  PROCESS 
1.  Introduction 

The  filtered  renewal  process  is  a  stochastic  process  which  is  a 
generalization  and  a  natural  extension  of  the  concept  of  the  filtered 
Poisson  process  (in  the  sense  of  Parzen,  1962),  in  which  the  underlying 
process  generator  is  modified  to  be  a  renewal  counting  process  rather 
than  a  Poisson  process.  The  filtered  Poisson  process  (sometimes  loosely 
called  the  compound  Poisson  process)  is  extensively  discussed  in  Blanc  - 
Lapierre  et  Fortet  (1953),  Parzen  (1962)  and  Karlin  and  Taylor  (1981). 
Filtered  renewal  processes  provide  models  for  a  large  variety  of  random 
phenomena  in  such  areas  as  queueing  theory,  physics,  economics, 
astrophysics,  and  population  immigration.  They  can  be  regarded  as 
arising  by  means  of  linear  operations  on  a  renewal  process,  in  which 
additionally  a  response  function  must  be  specified.  Many  problems  in 
simple  and  compound  renewal  processes,  such  as  the  renewal  reward 
process  or  the  cumulative  process  may  be  shown  to  be  special  cases  of 
the  filtered  renewal  process  by  judiciously  selecting  the  form  of  the 
response  function.  Note  that  this  response  function  will  remain  the 
same  both  for  the  filtered  renewal  process  and  the  filtered  Poisson 
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process.  We  now  formally  define  the  filtered  renewal  process  following 
Parzen: 

A  stochastic  process  {X(t),t  £  0}  is  said  to  be  a  filtered  renewal 
process,  if  it  can  be  represented,  for  t  &  0,  by 

N(t) 

X(t)  -  2  w(t,Wm,Ym)  ,  0  <  Wt  <  W2  <...<  WN(C)  <  t  (26) 

m-1 

where 

i)  {N(t),t  £  0}  is  an  ordinary  renewal  counting  process  with  known 
i.i.d.  interarrival  time  distribution  { } ,  and  waiting  times  (W^), 
i 

»i  -  s  Ti-  1  "  1 -2-  •••: 

J-l 

ii)  (Yn)  is  a  sequence  of  i.i.d.  random  variables,  and  independent  of 

(N(t) , t  >  0),  with  distribution  function  Gy  (•); 

n 

iii)  w(t,r,y)  is  a  function  of  three  variables  called  the  response 
function. 

For  example,  if  Wm  -  is  the  time  at  which  an  event  took  place, 
then  Yffl  -  y  represents  the  magnitude  of  a  signal  associated  with  the 
event,  w(t,rm,y)  represents  the  value  at  time  t  of  a  signal  of  magnitude 
y  originating  at  time  rm,  0<  rm<  t,  and  X(t)  represents  the  value  at 
time  t  of  the  sum  of  the  signals  arising  from  the  events  occurring  in 
[0,t] . 

The  primary  reason  for  being  unable  to  extend  the  theory  of 
filtered  Poisson  process  to  the  filtered  renewal  process  has  been  the 
unavailability  of  mathematical  techniques  to  handle  the  complex  multiple 


integral  expressions  arising  in  obtaining,  for  example,  the  joint 
characteristic  function  of  XCt^),  X(t2),...,  X(tm),  0  <  t^  <  t2  <  ...  < 
tm  (m  a  positive  integer) . 

The  theory  of  filtered  renewal  processes  may  be  studied  by 
exploiting  the  properties  of  multiple  integrals  developed  by  the  present 
author  (1971),  (1983),  and  extending  the  methodology  to  analyze  a  larger 
class  of  integrals,  in  order  to  obtain  reduction  formulas  for  evaluating 
these  integrals.  As  a  result,  expressions  for  the  various  statistical 
characteristics  of  the  process  such  as  the  probability  law  of  the 
process,  the  joint  characteristic  function  at  distinct  time  epochs,  the 
covariance  function,  E[(X(t)],  Var  (X(t)]  and  limiting  values  (as  t  ®) 
could  conceivably  be  obtained.  One  could  also  establish  asymptotic 
normality  of  the  filtered  renewal  process  for  given  response  functions. 
2.  Methodology 

For  the  filtered  renewal  process  {X(t),t  >  0),  the  characteristic 
function  is  given  by  E{e^sX^c^].  Using  the  definition  of  X(t),  this  is 
given  by 

E[eisX(c)  ]  -  E(exp{ is  Z  ^(t.W^  Y  )  1  ] 

m-1 

N(t)  m 

-  E[exp(  is  Z  w(t,  Z  Tt,  YB) }  ]  (27) 

m-1  i-1 


0<x^+x2+- • -+xn^t 


•00 

■  0 


n  m 

II  exp{isw(t,  X  xiP 
m-1  i-1 


7m}l 


f(x1)-f(x2)  ••••  f(xn)  •  F[t-(x1+x2  +  •••  +  Xn) ] 


^*1  dx2  ' ' '  ^xn  dG(yi>  dG^Y2^  • • ■  dG(yn) 


(28) 


If  we  define 


h(xi+Xo  +  • • •  +  xt)  -  Ey  {exp[isw(t,  2  x1 ,  Y  )]) 

j-1  J 


m 


exp[isw(t,  2  xt,  y)]  dG(y) 

0  J-i 


(29) 


Then  clearly,  the  integral  (17)  has  the  form. 


E(eisX(t)]  - 


n  i 

H  h(  2  Xi)  f(xt) 
i-1  j-1  J 


0<x^+X2+- • •+xn^t 


n  n 

•  F(t-  2  x*)  II  dxt 
i-1  i-1 


(30) 


Although  there  is  a  similarity  between  the  multiple  integrals  (30)  and 
(2),  in  general  the  integrand  is  not  the  same.  It  thus  becomes 
necessary  to  use  and  extend  the  methodologies  available  (Sivazlian, 

1971,  1983)  to  obtain  new  reduction  formulas  for  evaluating  integrals  of 
the  type  (30).  Again,  here  also,  the  reduced  and  simplified  formulas 
thus  derived,  would  be  of  particular  significance  in  characterizing  the 
filtered  renewal  process  and  in  obtaining  various  statistical  properties 
related  to  the  process 


IV.  APPLICATIONS 

With  a  better  understanding  of  the  theoretical  basis  of  a  process, 
a  better  appreciation  is  gained  in  characterizing  the  process  as  well  as 
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in  gaining  insight  into  its  various  properties.  A  direct  consequence  is 
that  the  process  may  become  more  amenable  to  a  wider  variety  of 
application,  or  may  be  used  to  better  approximate  the  behavior  of  a 
system.  Shortcomings  in  the  theory  invariably  produce  limitations  in 
its  applicability.  It  is  hoped  that  the  new  results  will  open  new 
vistas  of  applications  by  solving  many  complex  problems  in  renewal 
theory  and  in  filtered  renewal  theory. 

Among  some  of  the  application  areas  we  consider  the  following: 
i.  The  Cumulative  Damage  Problem  - 

Consider  a  component  subject  to  wear,  where  the  number  of  wearout 
occurrences  is  regulated  by  a  renewal  process,  and  where  in  addition, 
the  amount  of  wear  is  also  regulated  by  a  renewal  process .  Suppose  that 
the  component  has  been  operating  for  some  time.  Given  the  present 
wearout  state  of  the  component,  the  problem  of  predicting  the  future 
wearout  condition,  that  is  the  level  of  degradation  of  the  component,  as 
well  as  its  ultimate  failure  (first  passage  time)  may  be  addressed, 
ii.  The  Takacs'  Sojourn  Problem  for  an  Alternating  Renewal  Process  - 
This  problem  (see  Takacs,  1957)  has  a  variety  of  applications  such 
as:  1)  the  cumulative  damage  problem  previously  described;  2)  the 

problem  of  a  component  subject  to  failure  and  repair;  3)  the  traffic 
delay  problem;  and  4)  other  problems  in  statistics  and  operations 
research.  It  addresses  itself  to  determining  the  cumulative  effect  of  a 
certain  condition  (such  as  the  total  amount  of  repair  time  since  time 
origin)  for  a  system  which  can  only  be  in  two  states  rather  than  the 
total  number  of  events  related  to  that  condition  (such  as  the  total 
number  of  repairs  since  time  origin) .  Predictive  models  associated  with 
this  class  of  problem  may  be  formulated. 
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iii.  The  Time -Dependent  G/G/«  Queueing  System  - 

The  variety  of  applications  to  the  G/G/«  queueing  system  are  well 
established.  This  problem  can  be  formulated  as  a  filtered  renewal 
process  through  a  judicious  choice  of  the  response  function,  and  the 
time -dependent  characteristics  of  this  system  may  be  studied  as  well  as 
its  steady  state  behavior, 

iv.  A  Production  Problem  - 

Consider  a  manufacturing  system  which  has  unlimited  capacity  to 
produce  an  item.  Assume  that  items  arrive  for  production  according  to  a 
renewal  process  and  that  the  production  times  are  i.i.d.  random 
variables  with  known  distribution  function.  One  may  be  interested  in 
determining  for  any  give  time  t  since  production  start  the  following: 

a)  the  number  of  items  in  production; 

b)  the  backlog  of  production  time  on  the  items  which  are  in  the 
production  process; 

c)  the  probability  of  an  empty  production  system. 

Clearly,  this  is  a  variant  of  the  G/G/«  queueing  system  where  the 
above  quantities  can  be  determined  by  the  selection  of  an  appropriate 
response  function, 

v.  Other  Problems  - 

Some  of  the  other  problems  that  may  be  formulated  are: 

a)  Predictive  models  for  reliability  systems  such  as  systems  with 
standby  components; 

b)  Predictive  models  for  replacement  and  maintenance  systems, 
including  individual  replacement,  group  replacement  and  age 
replacement ; 

c)  Predictive  models  for  systems  described  by  the  superposition  of 


renewal  processes; 
vi.  A  Military  Application  - 

An  important  potential  area  in  applying  renewal  theory  is  the 
formulation  of  generalized  stochastic  Lanchester  equations  for  models  of 
military  combats.  It  is  well-known  that  these  models  are  of  the  deach 
or  attrition  type.  The  existing  models  could  conceivably  be  extended 
based  on  the  results  presented  in  this  paper. 

For  an  ordinary  renewal  counting  process  (N(t),  t  >  0),  we  have 
shown  that  a  transition  rate  for  the  process  can  be  generated  which  is 
both  non -homogeneous  and  time -dependent  taking  the  form 


f*(n+D(t) 


An(t)  - 


r(n) 


(t) 


,(n+l) 


0,  1,  2,... 


(t) 


As  a  result,  the  unconditional  distribution  of  N(t)  ,  namely,  P(n,t)  - 
P{N(t)-n)  satisfies  the  birth  equations 
dP(0,t) 

-  -  A0(t)  P(0,t) 

dt 

dP(n, t) 

- -  -  -  An(t)  P(n, t)  +  An.1(t)  P(n- 1 , t)  n-1,2,... 

dt 

with  initial  conditions  given  by  P(0,0)-1  and  P(n,0)-0,  otherwise. 

For  a  death  process  involving  an  initial  population  size  N,  we 

have : 

P(N,0)  -  1  and  P(n,0)  -  0  n  *  N 

We  may  write  for  example  the  following  equations  involving  "linear" 
death  rates  n  An(t): 
dP(N,t) 

-  N  An  (t)  P(N,t),  n  -  N 

dt 
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dP(n,t) 


-  -  n  An(t)P(n,t)  +  (n-1)  An+1  (t)  P(n  +  1,  t),  |  SnSN  •  1 

dt 

dP(O.t) 

-  (t)  P(l,t),  n  -  0 

dt 

This  is  a  natural  extension  of  existing  death  models.  It  is  evident 
that  this  provides  a  framework  of  analysis  for  combat  models  by 
generalizing  the  Markovian  models  while  retaining  the  structural 
properties  for  obtaining  closed  form  solutions.  The  reader  is  referred 
to  Sivazlian  (1989)  for  a  recent  application  of  combat  modeling. 


V.  CONCLUSIONS 

The  derivation  of  the  probability  law  (joint  distribution  function) 
of  a  renewal  counting  process,  and  the  analysis  of  the  filtered  renewal 
process,  form  the  theoretical  basis  1)  to  study  the  prediction  problem 
associated  with  systems  regulated  by  these  processes,  and  2)  to  apply 
the  results  to  several  useful  problems  in  reliability,  replacement, 
maintenance,  queueing,  production,  combat  analysis  and  other  areas  of 
operations  research. 

The  theoretical  knowledge  acquired  by  this  research  should  advance 
the  level  of  knowledge  in  the  statistical  characterization  of  renewal 
processes  and  filtered  renewal  processes.  Novel  insight  into  the 
properties  of  these  processes  and  a  better  understanding  of  their 
behavior  should  be  gained.  The  theoretical  results  derived  can  be  used 
immediately 

1)  to  solve  a  number  of  stochastic  problems  in  operations  research  and 
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other  applied  areas; 

2)  to  assess  the  performance  characteristics  of  systems  which  can  be 
modeled  as  a  renewal  process  or  as  a  filtered  renewal  process  or  as 
a  process  based  on  or  related  to  the  two  previous  ones; 

3)  to  develop  models  for  predicting  the  behavior  and  effectiveness  of 
systems  represented  by  any  of  the  above  processes.  Typical  measures 
of  effectiveness  could  include  for  example,  system  reliability, 
system  maintainability  and  system  availability. 

On  a  long  term  basis,  it  is  hoped  that  these  processes  may  become, 
through  an  improved  knowledge  of  their  behavior,  more  amenable  to  a 
wider  variety  of  applications . 
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FOR  COMPUTER  SIMULATION  OF  GUIDED  PROJECTILES 

M.  J.  Amoruso  H.  Cohen 

R.  Campbell  AMSAA 

ARDEC 

Sponsored  by:  U.  S.  Army  Armament  Research  Development  and  Engineering  Center 

Picatinny,  New  Jersey  07806-5000 

The  Army’s  Armament  Research  Development  and  Engineering  Center  (ARDEC)  has 
been  formulating  methods  for  computationally  efficient  computer  simulation  for  smart 
munitions.  Time  constants  associated  with  autopilot  components  are  often  small  com¬ 
pared  with  their  driving  terms.  The  integration  time  step  is  consequently  driven  to  very 
small  values  to  achieve  stable  numerical  integration,  which  results  increased  computer 
run  time.  An  innovative  technique  was  developed  in  which  exact  analytic  solutions  to 
differential  equations  and  transfer  functions  are  applied  in  a  piecewise  manner  within  a 
larger  but  lower  frequency  problem  that  is  solved  numerically.  Closed  form  analytical 
solutions  were  obtained  for  the  following:  the  first-order  lag,  the  first-order  lag  with 
differentiation,  the  first-order  lead/lag,  the  first-order  lag  with  Integrator,  the  second- 
order  lag/oscillator,  a  two-axis  gim baled  gyro,  and  an  impulse  thruster.  In  addition  to 
formulating  piecewise  analytic  solutions  to  smart  munition  components,  serial  configura¬ 
tions  of  transfer  functions  should  be  replaced  by  equivalent  parallel  configurations. 

This  approach  avoids  difficulties  arising  from  propagation  of  the  signal  through  a  se¬ 
quential  network  of  widely  varying  natural  frequencies  when  using  a  relatively  large  piece- 
wise  integration  time  step,  and  produces  a  decomposition  that  leads  to  terms  that  can  be 
readily  Integrated  analytically.  In  some  cases,  considerable  time  savings  were  obtained. 
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STEPWISE  CLOSED  FORM  TECHNIQUES 
FOR  COMPUTER  SIMULATION  OF  GUIDED  PROJECTILES 


When  modeling  guided  projectiles  in  6  DOF  (six  degree  of  freedom) 
simulations,  differential  equations  are  obtained  that  describe  the  various 
component  subsystems.  These  are  then  typically  integrated  numerically 
within  the  framework  of  the  6  DOF  simulation.  The  largest  allowable  time 
step  to  perform  the  integration  is  bounded  by  two  constraints.  The  driving 
term  or  input  must  not  vary  appreciably  during  a  time  step  and  the  time  step 
must  be  sufficiently  small  to  insure  a  stable  integration. 

Since  the  driving  term  rates  are  commensurate  with  the  airframe  motion 
rates,  inherently  slower  processes  than  those  associated  with  the  autopilot, 
stable  integration  is  a  lower  bound  to  integration  step  size  than  the 
requirement  for  driving  terms  that  remain  essentially  constant  during  the 
integration  time  step.  By  using  analytic  closed-form  solutions  for  the 
differencial  equations  for  the  autopilot,  the  second  constraint  appears  to  be 
eliminated.  An  innovative  technique  was  developed  by  which  exact  analytical 
solutions  to  the  required  transfer  functions  are  applied  in  a  piecewise  manner 
within  a  larger  but  lower  frequency  problem  which  must  be  solved 
numerically.  The  use  of  these  piecewise  analytical  solutions  to  the  transfer 
functions  guarantees  valid  integration  of  the  autopilot  transfer  functions 
regardless  of  the  integration  time  step. 

The  overall  system  is  usually  analyzed  into  simpler  terms  in  sequential 
order  that  are  separately  solved  by  numerical  integration  techniques, 
assuming  that  the  input  or  driving  term  is  essentially  constant  or  linear  during 
the  integration  time  step.  These  factors  are  concatenated  with  the  output  of 
one  factor  or  block  becoming  the  input  to  the  next  block.  Since  the  integration 
time  step  is  generally  quite  small  to  achieve  stable  numerical  integration, 
negligible  errors  are  introduced  as  the  signal  propagates  through  the  usually 
modest  number  of  transfer  function  blocks  down  to  the  output. 

The  new  approach  consists  in  introducing  analytical  closed-form  solutions 
for  the  transfer  function  factors  that  were  previously  treated  numerically. 
Since  these  are  exact  closed-form  solutions  for  constant  (or  linear)  driving 
terms,  the  solutions  bridge  the  time  step  perfectly  as  long  as  the  driving  term 
is  essentially  constant  (or  linear)  during  the  time  step.  See  Figure  1. 


n  -  1  n  n  +  1  n  +  2 

Figure  1.  Iterative  propagation  of  the  solution 
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The  result  of  the  integration  from  the  previous  or  n-l^  time  step  is  used  as 
th-  initial  condition  for  the  current  time  step.  A  value  for  the  driving  term 
during  the  current  or  n*  time  step  along  with  the  initial  condition  are  put  into 
the  analytic  closed-form  solution  to  propagate  the  solution  from  the  beginning 
of  the  n1*5  time  step  to  its  end,  where  the  resulting  solution  becomes  the  initial 
condition  for  the  next  or  n+  1th  time  step,  and  so  forth.  This  approach 
guarantees  stable  integration  contingent  only  upon  the  input  remaining 
essentially  constant  or  linear  during  the  integration  time  step. 

Table  1 

Typical  Autopilot  and  Actuator  Transfer  Functions 

TYPE  LAPLACE  DIFFERENTIAL 

OPERATOR  EQUATION 


First  Order  Lag 


First  Order  Lag  with 
Differentiator 


First  Order  Lead/Lag 


First  Order  Lag  with 
Integrator 


Second  order  Lag / 
Oscillator 


Second  order  Lag / 
Oscillator  with 
Differentiator 


1 

dy 

T 

IS  +  1 

dt 

s 

dy 

T 

IS  +  1 

dt 

V  +  l 

dy 

T2J  +  1 

T2 

dt 

1 

d2 

T 

a[tj  4-  1] 

dt2 

1 

d2 

Is2  +  Ds  +  K 

I 

dt' 

s 

A 

+  y  =  T 


Is  +  Ds  +  K 


+  y  - 


dT 


dt 


dT  ' 

+  y  =  T:  +  T 

dt 


dy 

+  —  =  T 


dt 


dt 


+  D  —  +  Ky  =  T 
2  dt 


dy  dT 

+■  D  +  Ky  = 

2  dt  dt 


Closed  form  solutions  have  been  obtained  for  the  typical  transfer  functions 
indicated  in  Table  1.  Note  that  zero  initial  conditions  are  assumed  for  the 
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Laplace  operators  in  this  table.  Non-zero  initial  conditions  will  be  described 
below. 


Savings  in  computer  time  vary  from  case  to  case  with  savings  typically  up 
to  an  order  of  magnitude.  An  impulsive  thruster  that  consisted  of  rapidly 
burning  material  in  a  grove  on  the  side  of  a  spin  stabilized  projectile  was 

modeled  with  great  savings  in  execution  time.  The  results  in  Tables  2  were 

obtained.  Note  that  three  different  time  steps  were  used  for  the  numerical 

integration.  With  the  coarsest  time  step  (10  6  sec),  agreement  was  to  only  3 

significant  digits.  For  this  case  the  analytical  approach  ran  23  times  faster. 
Agreement  between  the  analytical  and  numerical  approaches  could  be 
increased  by  two  more  significant  digits  by  decreasing  the  integration  time 
step  by  an  order  of  magnitude,  with  corresponding  increase  in  run  time  for 
the  numerical  approach. 


Table  2 

Impulsive  Thruster  Modeling 


APPROACH 

TIME  STEP 

CPU  TIME 

b) 

y 

Q) 

2 

(sec) 

(sec) 

(rad) 

(rad) 

Analytical 

N/A 

0.0106 

0.53275603 

-0.41630908 

Numerical 

l(f  8 

22.096 

0.53275601’ 

-0.41630910 

Numerical 

l(f  7 

2.278 

0.53275061 

-0.41631581 

Numerical 

i(T6 

0.246 

0.53234719 

-0.41681537 

However,  using  this  approach,  an  unexpected  complication  was  discovered. 
If  several  factors  are  concatenated  to  represent  a  more  complex  transfer 
function,  the  final  output  can  be  found  to  depend  on  the  order  of  the  transfer 
function  factors.  This  difficulty  arises  because,  although  the  input  of  a  block 
might  be  essentially  constant  during  an  integration  time  step,  the  output  might 
not  be  if  the  frequency  response  of  the  block  is  relatively  high.  This  output 
becomes  the  input  to  the  next  transfer  function  block.  The  requirement  that 
the  input  to  this  next  block  be  essentially  constant  can  break  down  unless  the 
sampling  rate  is  high.  This  requires  a  smaller  integration  time  step  and 
longer  computer  execution  time.  This  dilemma  does  not  arise  in  the  former 
numerical  integration  approach  because  the  very  fine  time  step  required 
avoided  difficulties  associated  with  incompatibilities  of  bandwidth  and 
frequency  content  as  the  signal  propagated  from  block  to  block. 
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The  solution  adopted  was  the  conversion  of  a  complex  transfer  function  to 
a  parallel  representation  instead  of  a  serial  representation.  The  obvious 
advantage  to  an  equivalent  parallel  representation  is  that  each  block  receives 
the  same  input  at  the  same  time  and  each  block  produces  its  output  at  the  end 
of  the  same  single  time  step.  These  outputs  do  not  become  inputs  to  other 
autopilot  transfer  function  blocks,  but  are  instead  summed  to  produce  the 
overall  output  for  the  overall  transfer  function.  Generally,  this  technique  not 
only  avoids  simulation  errors  arising  from  time  step  size  incompatibilities 
with  bandwidth,  internal  lag,  and  frequency  content  of  the  signal  propagating 
through  the  sequence  of  transfer  function  blocks  but  also  has  additional 
benefits.  In  addition  to  producing  an  algorithm  that  is  considerably  faster 
than  previous  numerical  integration  approaches,  parallel  decomposition 
generally  leads  to  a  combination  of  elementary  expressions,  whose  Laplace 
inverse  and  analytical  integration  are  well  known. 

The  general  treatment  to  implement  this  technique  is  outlined  as  follows: 

(1)  Writing  down  the  Laplace  operator  expression  including 
non-zero  initial  conditions  from  the  differential  equation 
description  or  from  the  block  diagram  (which  usually  will  not 
show  the  initial  conditions) 

(2)  Factoring 

(3)  Making  a  formal  partial  fraction  expansion 

(4)  Finding  the  expansion  coefficients 

(5)  Writing  down  the  expanded  Laplace  transform. 

The  latter  can  then  be  inverted  from  standard  tables  of  inverse  Laplace 
transforms  to  obtain  the  analytic  solutions  in  parallel  decomposition  or 
calculated  using  the  residue  theorem  of  complex  variables. 

It  is  worthwhile  to  emphasize  the  first  of  the  steps  enumerated  above. 
Autopilots,  seekers,  control  actuators,  and  other  components  of  guided 
projectiles  are  conventionally  described  in  block  diagrams  in  terms  of  the 
Laplace  operator  s.  Typically,  these  block  diagrams  represent  the  underlying 
differential  equations  only  if  the  initial  conditions  vanish.  This  is  a 
convenient  shorthand  notation  but  can  also  be  a  source  of  confusion.  Recall 

dy 

that  for  initial  conditions  y  =  y(t=  0),  y  =  (t  —  0),  etc.,  the  Laplace 

,  °  °  dt 

th 

transform  of  the  n  derivative  of  a  time  domain  function  is  given  by 

L[y  ]=sL[y(t)]-s  yQ-s  yQ-...-yQ  ( 1) 
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where  y('n')  represents  the  n<h  derivative  of  y(t)  with  respect  to  t,  and  L  and 
s  represent  the  Laplace  operator  and  variable  respectively. 


There  are  two  methods  for  implementing  this  procedure.  The  first 
involves  the  taking  of  limits  and  derivatives  and  requires  the  factoring  of  the 
transfer  functions  into  first-order  systems.  The  second  requires  the  solution 
of  sets  of  simultaneous  equations  and  does  not  involve  limits  or  derivatives. 
Factoring  the  transfer  function  into  first-order  terms  is  optional  in  the  second 
method.  These  techniques  are  well-known  from  partial  fraction  expansions  of 
algebraic  expressions. 

A  concrete  illustration  follows.  Consider  an  autopilot  component 
represented  by  the  following  block  diagram  in  Figure  2.  The  differential 
equation  corresponding  to  this  block  diagram  is 


y(t)  +  y(t)  —  2y(t)  ==  Driving  term 
=  K(t-tQ)  +  KQ  =  Kt  +  L 


(2) 


-> 


Figure  2.  Example  of  block  diagram 


K,  K  and  L  are  constants  and  t  is  time.  This  has  the  Laplace  transform 
2 

K/s  +  L/s.  Let  L{y(t)}  =  Y(s).  The  Laplace  transform  of  the  differential 
equation  is 

2  K  +  sL 

(  s  +  s  -  2  )Y(s)  -  y0(  \  +s  )  -  y0  =  (3) 

s 


where  the  additional  terms  are  due  to  the  initial  conditions  defined  by 

dy 

This  may  be  written  (factoring  the 


?|r-  ,0  and  ^0  = 

denominator) 


dt  l'=  fo 


Y(s)  = 


K  +  sL  +  j2(y0+y0)  +  s*yQ 


s  (s  +  2)(j  —  1) 


(4) 
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This  may  be  formally  expanded  as  follows: 


ABC  D 

Y  (s)  =  —  +  -  +  -  +  -  (5) 

j2  s  5  +  2  5—1 

If  there  are  no  multiple  roots,  the  partial  fraction  expansion  coefficients 
can  be  evaluated  one  at  a  time  by  taking  the  corresponding  factors  of  the 
denominator  of  (5)  one  at  a  time  and  multiplying  the  right  sides  of  (4)  and 
(5)  by  that  factor  and  then  equating  right  sides.  The  resulting  expression  is 
taken  to  the  limit  as  the  factor  goes  to  zero.  This  causes  all  expansion 
coefficients  but  one  to  drop  out.  Note  that  this  technique  fails  when  trying  to 
find  A  or  B  because  of  the  multiple  root.  The  expression  lim  57(5)  does  not 

2  s-o 

exist.  If  instead  one  tries  lim  s  Y(s),  A  =  -K/2  is  obtained.  To  obtain  B, 

,  S-0 

Mi  1 

take  lim  li  Y(s)  and  B  =  —  (K  +  2L)/4  results.  In  this  way,  using  one 

S-0  L  J 

ds 

factor  at  a  time,  all  the  expansion  coefficients  may  be  obtained. 


The  result  of  a  partial  fraction  decomposition  is 

K  ( K  +  2L )  (4y  Q- 4y  Q- K  +  2L)  (AT  +  L  +  2y  0+yQ) 

Y(s)  =  - -  -  -  +  -  +  -  (6) 


12(5  +  2) 


3(5-1) 


This  transition  to  parallel  decomposition  or  expansion  is  shown  in  Figure  3 
for  this  simple  case.  Tt  is  a  simple  matter  to  invert  this  expression  into  the 
time  domain. 


y(0  =  ~ 


Kt  ( K+2L )  [4y0-4yo-*  + 


-  2t 

e  + 


'  K  +  L+  2y0+yc 


An  alternate  approach  is  algebraic.  The  first  step  in  the  algebraic  approach 
is  always  the  same  as  above.  Factoring  the  denominator  of  the  transfer 
function  into  monomials  and  expanding  in  terms  of  these  monomials  is  done 
as  before.  Write  the  right  side  of  the  formal  partial  fraction  decomposition 
with  a  least  common  denominator  and  equate  to  the  Laplace  expression  for 
the  autopilot  transfer  function. 

K  +  sL  +  s2(yQ+yQ)  +  s3yQ 

T(5)  =  -  (8) 

2  2 

5  (5  +5-2) 
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ABC  D 

=  —  +  —  +  -  +  - 

s2  5  (5  +  2)  5—1 

A(s  +5  —  2)  +  Bs(s  +5  —  2)  +  Cs  (s  —  1)  +  Ds  (s  +  2) 

2  2 

5  (5  +5  —  2) 

The  denominators  and  numerators  are  equal.  Making  use  of  the  linear 
independence  of  powers  of  s,  a  set  of  equations  is  obtained  for  the  expansion 
coefficient  which  must  be  solved  simultaneously.  This  yields  the  same  result 
as  before. 

Alternatively,  the  investigator  may  wish  to  retain  some  higher  order  terms 
rather  than  reduce  all  the  denominators  to  monomials,  perhaps  to  retain  a 
physical  interpretation  of  terms. 


In  summary,  by  giving  up  the  generality  of  numerical  integration  and 
using  closed  form  solutions  to  particular  differential  equations  in  stepwise 
fashion,  significant  savings  in  computer  run  time  can  be  obtained.  Care  must 
be  taken  when  concatenating  several  such  solutions  in  sequence.  If  the 
product  of  several  sequential  transfer  functions  can  be  recast  into  an 
equivalent  network  of  transfer  functions  in  parallel,  difficulties  arising  from 
propagation  of  the  signal  through  a  sequential  network  can  be  eliminated  even 
when  relatively  large  integration  time  steps  are  used.  This  can  be  done  by 
making  a  partial  fraction  decomposition  of  the  Laplace  operator 
representation.  This  technique  produces  a  decomposition  that  leads  to  terms 
that  can  be  readily  integrated. 
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(a)  Step  1 


K  +  sL  +  s2(yQ+yQ)  +  s3y0 
2  2 

s  (  s  +  s  -  2) 


(b)  Step  2 


K  +  sL  +  s2(yQ+y0)  +  s3yQ 
s2(  s  +  2  )  (  s  —  1) 


(c)  Steps  3-4 


4y0-4yQ-K  +  2L 
12 (s  +  2) 


— > 


K  +  L  +  2yn+yn 
3(5  -  1) 


-  K 


2  s 


-(K  +  2L) 
4  5 


o 


Figure  3.  Example  of  transition  to  parallel  representation 
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ABSTRACT:  A  number  of  possible  optimizations  for  Grobner  Basis  con¬ 
struction  are  presented.  Currently  we  axe  developing  a  system  which  per¬ 
mits  easy  experimentation  with  these  and  other  optimizations.  Perhaps 
as  important  as  the  hoped  for  optimizations,  a  conceptual  framework  is 
discussed,  within  which  these  and  other  possible  optimizations  are  easily 
presented.  The  framework  should  be  elaborated  by  others,  as  needed  to 
support  their  own  optimization  experiments. 

INTRODUCTION:  Here  are  several  related  potential  optimizations  for 
finding  Grobner  bases  within  A:[X]  =  fc[Xi, . . . ,  Xn],  where  A;  is  a  field.  The 
main  idea  is  to  precipitate  internal  reduction  by  finding  elements  whose 
lead  monomials  divide  other  (lead)  monomials.  This  obviates  some  S-pairs 
and  may  cut  down  storage  requirements.  We  also  use  a  generalization  of: 
discarding  the  S-pair  of  f  and  g  when  the  lead  monomials  of  f  and  g  are 
relatively  prime.  Our  optimizations  are  supported  by  a  novel  approach 
to  Grobner  basis  construction.  Typically,  Grobner  basis  construction  is 
formulated  in  terms  of  two  sets  G  and  P.  G  is  the  forming  Grobner  basis 
and  P  is  a  set  of  particular  S-pairs  from  GxG  which  remain  to  be  reduced.  If 
all  of  the  S-pairs  in  P  reduce  to  zero  over  G,  then  G  is  a  Grobner  basis.  One 
traditional  optimization  question  is  avoiding  unnecessary  S-pair  reductions. 
I.e.  how  to  keep  P  small.  Unnecessary  reductions  waste  computation.  An 
unnecessarily  large  set  P  is  a  waste  of  memory. 

The  underlying  idea  of  our  approach  is  formulated  in  terms  of  three 
sets,  G,  J  and  P.  P  is  a  set  of  particular  S-pairs  from  GxG.  (G, P)  has 
the  following  property: 
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if  all  of  the  S-pairs  in  P  reduce  to  zero  over  G  U  J  and  if  all  of  the 
*  S-pairs  from  G  x  J  and  J  x  J  reduce  to  zero  over  G  U  J  then  G  U  J  is 
a  Grobner  basis 

By  dividing  the  forming  Grobner  basis  into  two  sets  G  and  J,  it  is  only 
necessary  to  explicitly  keep  track  of  S-pairs  from  G  x  G.  The  S-pairs  from 
G  x  J  and  J  x  J  are  known  implicitly  because  G  and  J  are  known.  This 
saves  memory.  Part  of  our  approach  is  the  notion  of  allowable  moves.  An 
allowable  move  might  remove  an  element  from  P  at  the  expense  of  adding 
an  element  to  J .  Reduction  of  an  element  of  P  over  G  U  J  is  such  an 
example.  An  allowable  move  might  move  an  element  from  J  to  G  and  add 
elements  to  P.  An  allowable  move  might  move  a  number  of  elements  from 
G  to  J  and  discard  elements  of  P.  Allowable  moves  preserve  (*).  The 
objective  is  to  use  allowable  moves  to  reach  a  stage  where  P  and  J  are 
empty.  When  P  becomes  empty  there  are  no  longer  explicit  S-pairs  from 
G  x  G  which  need  to  be  reduced,  and  when  J  becomes  empty  there  axe  no 
longer  implicit  S-pairs  from  G  x  J  and  J  x  J  to  reduce  and  G  is  a  Grobner 
basis  by  (*). 

(G,  J,  P)  satisfying  (*)  is  the  underlying  idea  of  our  approach  to  opti¬ 
mization,  but  we  have  added  two  refinements: 

J  is  subdivided  into  two  sets  H  and  R 
two  properties  are  added  to  (*)  which  are  preserved  by  the  allowable  moves 

Our  prospective  optimizations  are  untested  and  other  refinements  of  the 
underlying  idea  of  (G,  •/,  P)  satisfying  (*)  may  prove  to  be  more  effective. 
We  encourage  researchers  in  the  area  of  Grobner  basis  optimization  to  ex¬ 
periment  with  their  own  refinements  to  (G, /,  P)  satisfying  (*). 

At  the  present  time  we  are  developing  a  system  to  test  these  and  related 
potential  optimizations.  The  system  MACAULAY  by  Bayer  and  Stillman 
is  a  Grobner  Basis  based  computer  algebra  system  but  is  not  designed  for 
experimenting  with  variations  on  the  fundamental  algorithms.  A  selection 
of  papers  addressing  Grobner  basis  optimization  can  be  found  in  the  refer¬ 
ences. 

THE  SETTING:  Suppose  we  have  an  implicit  term  ordering  which  allows 
us  to  form/find  LM(/)  —  the  lead  monomial  of  /.  Let:  MLCM(/, g)  — 
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the  Monomial  LCM  of  /  and  g  —  denote  LCM(LM(/),LM(p)).  Define  the 
NT  order  on  fc[X]  in  terms  of  the  original  implicit  order,  as  follows:  for 
/,  g  €  k[X],  f  <  g  in  the  NT  order  if 

total  degree  /  <  total  degree  g 


or 

total  degree  /  =  total  degree  g 
and  LM(/)  >  LM(<?)  in  the  original  implicit  order 

Here  is  the  promised  generalization  of:  discarding  the  S-pair  of  /  and 
g  when  the  lead  monomials  of  /  and  g  are  relatively  prime.  For  p  =  (/,  g)  E 
fc[X]  x  fc[X],  S{p )  =  S(f,g)  =  S-polynomial  for  the  pair  (/,  g),  formed  as 
follows: 

Let  m  be  the  monomial  MLCM(/,  g).  Write  /  =  q\m  +  r  and 
g  =  q2m  +  s,  where  m  does  not  divide  any  of  the  monomials  of  r 
or  s.  For  S(f,g )  use: 


?2 /  “  qi9  =  q2r  ~qis 


For  P  C  k[X]  x  Jfe[X],  5(P)  =  {5(p)|p  6  P}. 

We  assume  there  is  an  appropriate  notion  of  reduction  of  a  given  ele¬ 
ment  over  a  set.  Typically  this  is  repeated  reduction  of  the  lead  term  of  the 
given  element  over  the  set  or  repeated  reduction  of  all  terms  of  the  given 
element  over  the  set.  We  are  experimenting  with  both.  Whichever  notion 
of  reduction  is  used,  if  a  given  element  is  fully  reduced  over  a  set.  then 
the  lead  monomial  of  the  given  element  must  not  be  divisible  by  the  lead 
monomial  of  any  element  of  the  set. 

THE  FRAMEWORK:  The  approach  we  are  about  to  describe  involves 
several  stages  of  Grobner  basis  construction.  Frequently  Grobner  basis 
construction  is  described  in  terms  of  one  set  —  the  forming  Grobner  basis 
—  and  another  set  —  the  S-pairs  which  remain  to  be  checked.  The  easiest 
way  to  describe  our  ideas  is  to  split  the  forming  Grobner  basis  into  three 
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parts  and  explicitly  keep  track  of  the  remaining  S-pairs  for  the  first  of  the 
three  sets. 

A  Grobner  frame1  is  the  following: 

Three  sets  G,H,R  C  fc[X]  and  a  set  P  C  G  x  G,  where 

1.  G  U  if  U  R  is  a  Grobner  basis  if  every  element  in 

S(P)US((GuifUf?)  x  (Gu  Hu  R)\G  x  G) 

reduces  to  zero  over  G  U  H  U  R. 

2.  For  every  (/,  g)  6  P,  LM(/)  and  LM(p)  are  NOT  relatively 
prime. 

3.  Every  element  of  G  U  H  U  R  is  fully  reduced  with  respect 

toGuif. 

EXAMPLE:  GETTING  STARTED.  Given  a  subset  of  k[X],  call  the  set 
R.  Let  P,  G  and  H  be  empty  sets. 

Given  a  Grobner  frame,  the  object  is  to  use  only  the  ALLOWABLE 
MOVES,  described  below,  to  decrease  P,  H  and  R  to  the  empty  set.  When 
this  is  achieved,  G  is  a  Grobner  basis.  Starting  with  three  sets  G,H,R  C 
A:[X]  and  a  set  P  C  G  x  G,  forming  a  Grobner  frame,  they  still  form  a 
Grobner  frame  after  an  allowable  move. 

ALLOWABLE  MOVES: 

MOVE  P:  If  P  is  not  empty,  choose  p  €  P,  set  P  =  P  \  {p}, 
form  S(p),  fully  reduce  S(p)  over  G  U HU  R.  If  the  final 
reductum  r  is  not  zero,  set  R  —  R  U  {r }. 

MOVE  H:  If  H  is  not  empty,  choose  h  £  if,  set  H  =  H\{h}. 

Let  M  be  the  ideal  in  the  monoid  of  monomials  generated 
by  MLCM(p,  h)  as  g  runs  over  G.  Let  G'  C  G  be  chosen 
such  that  {MLCM(<7,  h)\g  E  G'}  generates  M.  When  the 
MLCM(<7,  /i)’s  are  being  computed,  make  note  of  those 

1  One  frame  in  the  movie  of  the  forming  Grobner  basis. 
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g  such  that  LM(^)  and  LM(/i)  are  relatively  prime.  Let 
G"  be  the  elements  of  G'  where  LM(<?)  and  LM (h)  are 
not  relatively  prime.  Set  P  =  P  U  (G"  x  {/i})  and  set 
G  =  Gl){h}. 

MOVE  R :  If  R  is  not  empty,  choose  s  E  R,  set  R  =  R  \  {s}.  Let  R' 
start  as  the  empty  set.  For  each  m  E  R,  reduce  m  with  respect 
to  {s}  UGUff,  starting  with  {s}.  If  the  final  reductum  r  is 
not  zero,  set  R'  =  R'  U  {r}.  When  done  considering  all  m  E  R, 
set  R  =  R' .  Let  H'  start  as  the  empty  set.  For  each  h  £  H, 
reduce  h  with  respect  to  {s}  U  G  U  (H  \  { h }),  starting  with  {s}. 

If  the  final  reductum  r  has  the  same  lead  monomial  as  /i,  set 
H'  —  H'  U  {r}.  Otherwise,  if  r  is  non-zero,  set  R  =  R  U  {r}. 
When  done  considering  all  h  E  H ,  set  H  =  H'  U  {s}.  Let  G' 
start  as  the  empty  set.  For  each  g  E  G,  reduce  g  with  respect  to 
{s}  U(G\ {#})  U  H,  starting  with  {s}.  K  the  final  reductum  r  has 
the  same  lead  monomial  as  g,  then  set  G'  =  G'U{r}.  Replace  each 
(/,  g)  €  P  by  (/,  r).  Replace  each  ( g,f )  6  P  by  (r,/).  Otherwise 
if  r  is  non=zero  but  has  lead  monomial  less  than  that  of  g,  set 
R  =  R\j{r}.  Delete  each  (/,  g)  €  P  from  P.  Delete  each  (g,  /)  G  P 
from  P.  When  done  considering  all  g  E  G,  set  G  =  G'. 

MOVE’S  R,  H  and  P  involve  choosing  elements  from  R,  H  and  P. 
Among  others,  we  are  experimenting  with  the  following  priorities: 

1.  When  R  is  not  empty  do  MOVE  R. 

2.  When  R  is  empty  but  H  is  not  empty,  do  MOVE  H. 

3.  When  R  and  H  are  empty,  do  MOVE  P. 

Our  goal  is  to  precipitate  internal  reduction  by  finding  elements  whose  lead 
monomial  divides  many  other  lead  monomials.  These  priorities  are  expected 
to  precipitate  internal  reduction  at  relatively  little  computational  expense. 
We  elaborate  on  this  theme  when  discussing  MOVE  R. 

For  MOVE  R.  Here  one  chooses  an  element  r  E  R  and  reduces  el¬ 
ements  of  G  U  H  U  R  over  {r}.  One  could  tentatively  pick  each 
element  r  E  R  and  count  how  many  elements  of  G  U  H  U  R  would 
reduce  over  that  {r}.  Then  use  the  r  which  causes  the  most  reduc- 
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tion  as  the  actual  choice.  This  would  be  computationally  expen¬ 
sive.  There  might  be  something  along  this  line  using  clever  data 
structures  and  updating  information  cleverly  which  achieves  this. 

At  present  we  axe  investigating:  choose  the  element  of  R  whose 
lead  monomial  is  as  small  as  possible  in  the  NT  order.  The  hope 
is  this  will  pick  elements  r  £  R  which  tend  to  cause  reduction. 

For  MOVE  H.  We  have  no  particular  best  guess  at  this  point. 

For  MOVE  P.  Ideally,  one  would  choose  (/,  g)  £  P  where  the  lead 
monomial  of  the  final  reductum  of  S(f,g )  causes  as  much  internal 
reduction  as  possible.  Or,  following  the  simplification  for  MOVE 
i?,  choose  (/,  g)  €  P  where  the  lead  monomial  of  the  final  reduc¬ 
tum  of  S(f,g )  is  as  small  as  possible  in  the  NT  order.  Even  this 
simplification  seems  much  too  computationally  expensive.  A  fur¬ 
ther  simplification  is:  choose  (/,<?)  £  P  where  MLCM(/,  g)  is  as 
small  as  possible  in  the  NT  order.  MLCM(/,  g)  gets  computed 
(and  can  be  saved)  for  each  pair  (/,  g)  when  building  up  P.  Po¬ 
tential  optimization  aside,  S(f,g)  need  only  be  computed  when 
the  pair  (/,<?)  is  selected  in  MOVE  P.  Pairs  may  be  discarded 
from  P  without  ever  computing  (or  reducing)  S(f,g).  Thus  the 
question:  for  purposes  of  optimization,  how  much  work  should  be 
done  with  5(/,  <7)  for  pairs  (/,  g)  £  P  which  have  not  been  selected 
in  MOVE  P? 

When  the  minimal  guidelines,  based  on  small  lead  monomials  in  the 
NT  order,  together  with  our  other  prospective  optimizations,  are  followed 
for  small  hand  computations,  we  have  been  pleased  with  how  few  S-pairs 
ever  are  computed. 
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ABSTRACT 

Variational  models  of  phase  transitions  seek  to  explain  the  observed 
fine  scale  structures  of  phase  mixtures  as  being  due  to  bulk  energy  minimi¬ 
zation.  One  theory  of  this  type,  geometrically  linear  in  character,  has  been 
developed  in  the  metallurgical  literature  by  Khachaturyan  and  others.  A 
second,  apparently  different,  geometrically  nonlinear  theory  has  been 
developed  in  the  mathematical  literature  by  Ball,  James,  and  others.  We 
show  that  Khachaturyan’s  theory  is  roughly  the  linearization  of  Ball  and 
James’  approach.  We  also  discuss  how  Khachaturyan’s  method  permits  the 
explicit  relaxation  of  certain  double-well  energies  in  the  linear  setting.  The 
corresponding  calculation  for  triple-well  energies  remains  incomplete. 


1.  Introduction 

Coherent  mixtures  of  crystalline  solids  have  long  been  studied  using  elasticity.  The 

metallurgical  literature  has  primarily  been  based  on  linear  theory,  see  e.g.  [25,26,31- 

33,41,45,48,51-56,60,61].  Recent  mathematical  work,  on  the  other  hand,  has  taken  a 

geometrically  nonlinear  viewpoint,  see  e.g.  [4-6,8,11-15,18-22,27-30,34].  There  is  naturally 

a  connection  between  these  two  approaches,  and  indeed  the  link  is  much  stronger  than  has 

heretofore  been  recognized.  The  purpose  of  this  article  is  to  explore  that  connection  in 

some  detail,  focussing  particularly  on  the  nonlinear  variational  model  of  Ball  and  James 

[4,27]  and  on  work  done  in  the  linearized  context  by  Roitburd,  Khachaturyan,  and  Shatalov 

•Supported  by  ARO  contract  DAAL03-89-K-0039,  DARPA  centred  F49620-87-C-0065,  ONR  grant  N00014-88-K-0279 
and  NSF  grant  DMS-8701893 
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[31,33,51]. 


Coherent  phase  mixtures  arise,  for  example,  from  martensitic  phase  transitions  and  in 
the  early  stages  of  decomposition  processes.  They  have  rather  characteristic  fine  scale 
structures,  often  involving  laminar  arrangements  of  phases  or  distributions  of  like-shaped 
inclusions.  A  central  goal  of  the  variational  theory  is  to  explain  the  origin  of  these  micros¬ 
tructures.  A  comprehensive  survey  is  beyond  the  scope  of  this  article,  due  to  the  vastness 
of  the  literature  and  also  the  limitations  of  the  author’s  expertise.  Nevertheless,  we 
attempt  a  brief  introduction. 

It  is  precisely  the  condition  of  coherence  that  permits  an  analysis  based  on  elasticity. 
Briefly,  this  condition  assures  that  the  atoms’  actual  positions  are  related  to  their  locations 
in  a  reference  lattice  by  a  continuous  elastic  deformation.  More  detailed  discussions  of  this 
point  will  be  found  in  [42-44],  where  the  central  notion  is  the  "network  constraint,"  and  in 
[11,12],  where  the  discussion  is  based  on  the  "Born  rule.” 

One  well-known  approach  in  the  linear  context  has  its  origins  in  the  work  of  E  shelby 
[16,17],  which  gives  the  elastic  field  due  to  an  elliptical  inclusion  of  one  phase  in  an  other¬ 
wise  uniform  second  phase.  This  leads  to  an  approximate  formula  for  the  elastic  energy  of 
a  multi-phase  mixture,  either  through  a  mean  field  theory  or  by  taking  a  dilute  distribution 
of  ellipsoids  as  the  phase  geometry.  One  can  try  to  predict  the  inclusion  shape  by  minimiz¬ 
ing  this  approximate  energy,  see  e.g.  [45,56].  Here  we  will  not  deal  with  such  approximate 
theories,  but  rather  with  calculations  that  are  (mathematically,  if  not  physically)  exact. 

A  different  approach  was  developed  independently  by  Khachaturyan  [31]  and  Roit- 
burd  [51]  for  the  two-component  case,  and  subsequently  generalized  to  more  than  two  com¬ 
ponents  by  Khachaturyan  and  Shatalov  [33].  Their  theory  is  geometrically  exact,  in  the 
sense  that  it  makes  no  hypotheses  about  the  phase  geometry,  and  it  computes  the  elastic 
fields  exactly.  One  pays  a  price  for  this  generality,  however:  their  method  requires  that 
the  phases  all  have  the  same  elastic  moduli.  For  mixtures  of  two  such  phases,  this  work 
gives  a  formula  for  the  extremal  elastic  energy  as  a  function  of  the  stress-free  strains,  the 
volume  fraction  of  each  phase,  and  the  elastic  moduli  [32].  It  shows  moreover  that  this 
extremal  energy  is  always  achieved  by  a  layered  microstructure.  Subsequent  work  has 
noted  that  other  microgeometries  may  also  be  extremal,  depending  on  the  symmetry  of  the 
individual  phases.  For  three  or  more  phases  the  analysis  is  less  complete:  the  treatment  of 
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[31-33]  does  not  yield  a  formula  for  the  extremal  energy  of  a  general  three-phase  mixture. 

The 'preceding  work  is  all  linear  in  character.  Recently,  a  number  of  authors  have 
explored  models  based  on  nonlinear  elasticity.  One  approach  leads  to  an  elastic  energy 
density  with  infinitely  many  local  minima  [14,15];  the  associated  "relaxed"  energy  is  unfor¬ 
tunately  rather  degenerate  [6,18-20,22].  We  prefer  the  viewpoint  of  Ball  and  James 
[4,5,27],  in  which  the  elastic  energy  density  has  one  "well"  for  each  phase  or  phase  variant. 
The  relaxation  of  such  an  energy  has  yet  to  be  computed;  rather,  attention  has  been 
focussed  on  determining  where  it  achieves  its  minimum  value.  This  is  sufficient  for  study¬ 
ing  transformations  that  do  not  involve  internal  stress,  a  class  which  includes  many  marten¬ 
sitic  phase  transitions. 

The  linear  theory  of  Khachaturyan  fit  aL.  and  the  nonlinear  one  of  Ball  and  James  are 
superficially  quite  different:  the  former  deals  with  phase  microgeometry  directly,  while  the 
latter  lets  it  enter  through  the  structure  of  energy  minimizing  sequences.  Both  approaches 
involve  the  minimization  of  an  elastic  energy,  however,  and  each  provides  a  rationale  for 
the  more  phenomenological  "crystallographic  theory  of  martensitic"  [7,59].  Thus  it  is 
natural  to  look  for  a  relationship  between  them. 

The  connection  is  in  fact  quite  close.  Roughly  speaking,  Khachaturyan ’s  calculation  is 
equivalent  to  the  relaxation  of  a  linearized  version  of  the  energy  studied  by  Ball  and  James. 
Our  goal  is  to  explain  this  relationship,  and  hopefully  to  bridge  the  language  barrier  which 
currently  separates  the  two  theories. 

We  begin,  in  Section  2,  with  the  linearization  of  the  Ball-James  theory.  The  linear 
analogue  of  their  energy  turns  out  to  be  a  minimum  of  paraboloids,  differing  as  to  their 
shape,  height,  and  the  locations  of  their  minima.  Each  paraboloid  is  the  gTaph  of  the 
linearly  elastic  energy  for  a  separate  phase  or  phase  variant. 

In  Section  3  we  explain  what  one  means  by  the  "relaxed"  or  "macroscopic"  energy 
QW  associated  to  a  given  energy  density  W.  The  functions  that  emerge  from  Section  2 
have  the  special  form  W  =  min  {W1,  .  .  .  ,WN}.  For  such  functions  we  introduce  a  new 
concept,  the  "relaxation  at  fixed  volume  fraction"  Q^W.  It  gives  the  macroscopic  energy  of 
the  system  when  both  the  average  strain  and  the  volume  fractions  of  the  phases  are  fixed. 
The  standard  relaxation  QW  is  just  m/e  Q*W  ,  in  which  0  varies  over  all  possible  volume 
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fractions  (see  Proposition  3.1);  thus  knowledge  of  Q*W  for  all  0  effectively  determines 
QW. 

For  a  system  consisting  of  two  linear  phases  with  the  same  elastic  moduli,  the  calcula¬ 
tion  of  Roitburd  and  Khachaturyan  amounts  to  the  determination  of  QeW.  This  is  the  cen¬ 
tral  link  between  the  linear  and  nonlinear  viewpoints,  and  it  is  discussed  in  Section  4.  A 
more  complete  discussion  concerning  the  relaxation  of  double-well  energy  functions  will  be 
found  in  [36]. 

The  corresponding  analysis  for  three  or  more  linear  phases  is  presented  in  Section  5. 
Our  approach  is  essentially  the  same  as  that  of  Khachaturyan  and  Shatalov  [33],  and  we  get 
no  further  than  they  do.  In  selected  cases,  when  the  stress-free  strains  are  related 
appropriately,  one  can  determine  the  minimum  value  of  Q0W;  this  is  the  linear  analogue  of 
work  by  Ball  and  James  [4].  In  general,  however,  the  calculation  of  Q$W(£)  remains 
open.  The  new  notion  of  H-measures,  recently  introduced  independently  by  Tartar  [38] 
and  Gerard  [24],  may  be  useful  in  this  context:  as  we  shall  explain,  calculating  Q^W  is 
equivalent  to  minimizing  a  certain  functional  over  the  H-measures  associated  to  certain 
characteristic  functions. 

In  its  use  of  Fourier  analysis,  Khachaturyan ’s  calculation  of  Q$W  bears  a  strong 
resemblance  to  recent  work  in  homogenization  [2,38,50];  thus  we  are  in  essence  using 
homogenization  to  compute  the  relaxations  of  certain  energy  integrands.  This  link  between 
relaxation  and  homogenization  has  in  fact  been  noted  before  [39,40].  The  main  difference 
between  those  discussions  and  the  present  one  is  that  here  the  phases  have  different  stress- 
free  strains  and  the  same  elastic  moduli. 

Our  attention  is  concentrated  entirely  on  the  minimization  of  bulk  energy;  one  expects 
such  an  analysis  to  be  qualitatively  correct  if  the  effects  of  surface  energy  are  sufficiently 
small.  The  latter  are  presumably  important  for  determining  the  length  scale  and  periodicity 
of  the  micro  structure;  they  are  also  thought  to  be  the  reason  for  the  appearance  of  plate¬ 
like  inclusions  when  the  theory  predicts  fine-scale  layering.  Formal  treatments  of  these 
effects  will  be  found  in  [4,32];  there  has  been  little  rigorous  analysis,  but  see  [21,35]  for 
some  preliminary  steps  in  that  direction.  In  addition,  surface  energy  is  presumably  respon¬ 
sible  for  selecting  between  distinct  microstructures  with  the  same  bulk  energy  (see  Remark 
4.4). 
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Taken  together.  Proposition  3.1  and  Theorem  4.1  determine  the  relaxation  of  a  "two- 
well"  energy  describing  a  system  of  two  linearly  elastic  phases  with  the  same  elastic 
moduli.  For  the  special  cases  of  an  isotropic  elastic  law  in  two  space  dimensions,  that 
relaxation  was  previously  computed  by  Lurie  and  Cherkaev  [47].  Their  analysis  is  quite  dif¬ 
ferent  from  ours,  being  based  on  the  method  of  "polyconvexification”  rather  than  Fourier 
analysis. 

The  hypothesis  of  equal  elastic  moduli  is  apparently  a  good  approximation  for  many 
two-phase  systems:  microstructures  consistent  with  Khachaturyan’s  theory  are  observed  in 
a  wide  variety  of  systems  [32].  However,  this  approximation  is  certainly  not  always  valid. 
We  have  recently  extended  Khachaturyan’s  calculation  to  the  case  of  two  phases  with  dif¬ 
ferent  elastic  moduli,  provided  that  the  elasticity  tensors  are  in  a  certain  sense  well-ordered 
[37].  This  extension  is  based  on  the  Hashin-Shtrikman  variational  principle;  it  is  similar  to 
[2,38,50]  except  that  the  phases  have  different  stress-free  strains. 

2.  Linearization  of  the  B all-James  Theory 

Ball  and  James  have  developed  a  model  for  martensitic  phase  transitions  based  on  fin¬ 
ite  elasticity  [4],  Their  idea  is  to  minimize  a  non-elliptic  energy  function  which  has  a 
separate  "well"  for  each  phase  or  phase  variant.  We  focus  on  the  case  of  a  cubic-tetragonal 
phase  transition  such  as  that  of  InTl,  following  [4]. 

The  elastic  energy  has  the  form 

E[u]  =  fQ  WT(Vu)dx,  (2.1) 

where  fl  C  R3  is  a  reference  domain,  u:Cl  -R3  is  an  elastic  deformation,  and  T  represents 
temperature.  The  temperature  could  vary  from  point  to  point,  i.e.  T  =  T(x);  but  it  is  con¬ 
sidered  given,  not  to  be  varied  in  the  minimization  of  E.  For  simplicity,  we  shall  assume 
for  the  duration  of  this  discussion  that  T  is  constant.  The  energy  density  satisfies  the  con¬ 
dition  of  frame  indifference: 

Wt(RF )  =  Wt(F)  for  50(3).  (2.2) 

(S0(3)  is  the  group  of  orientation  preserving  rotations  of  R3.)  Asa  consequence,  the  local 
minima  of  Wy  must  occur  on  orbits  of  50(3). 
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For  a  cubic-tetragonal  phase  transition,  W?  has  four  "wells,"  one  corresponding  to  the 
austenite  (cubic)  phase  and  the  others  to  the  three  symmetry-related  variants  of  martensite 
(the  tetragonal  phase).  There  is  an  exchange  of  stability  at  the  transformation  temperature 
Tc :  for  T  >  Te  the  absolute  minimum  is  in  the  austenite  well,  whereas  for  T  <  Te  it  is  in 
the  martensite  wells.  We  take  austenite  at  T  —  Tc  as  the  reference  configuration,  with  spa¬ 
tial  axes  aligned  with  the  axes  of  symmetry.  Then  the  minima  of  Wt  at  T  =  Te  are  the 
orbits  {/?},  {/Mi},  {RA2},  and  {FA3}  where  R  ranges  over  50(3)  and 
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(2.3) 

are  the  three  symmetry-related  transformation  strains.  According  to  [4],  t\  ~  .026  and 
8  »  .013  for  InTl. 

The  main  quantities  one  can  measure  are  the  transformation  strains  and  the  linear 
elastic  moduli  of  each  phase.  These  determine  the  locations  of  the  relative  mimima  of  WT 
and  its  behavior  near  those  minima.  One  might  also  impose  the  condition  that  W(F)  -  « 
as  det  F  -  0  ,  but  basically  the  form  of  WT  is  open  when  F  is  far  from  a  natural  state.  One 
approach  is  to.  use  a  simple  polynomial  function  for  Wt  ,  see  e.g.  [8,13],  However,  we 
prefer  to  view  each  phase  as  having  "its  own"  energy  function,  with  Wt  being  the  minimum 
of  the  lot: 

Wt(F)  =  min{WHF )  ,  j  =  0, 1,2,3}.  (2.4) 

Here  Wj-  corresponds  to  austenite,  and  WfT  ,i  =  1,2,3,  to  martensite.  At  T  =  Tc,  W%  is 
minimized  at  the  orbit  of  the  identity  and  Wt  at  that  of  A,-.  Since  the  martensite  phases  are 
symmetry-related,  their  energies  satisfy 

W$(F)  -  W}(FZ?i2)  ,  W3t(F)  =  WfrFRa)  (2.5) 

for  some  rotations  R&  and  R 13  in  the  symmetry  group  of  the  cube  but  not  in  that  of  the 
tetragon.  It  follows  from  (2.5)  that  A 2  R12  ~  Ai  =  A3  R13. 

By  frame  indifference,  each  of  these  energies  should  depend  only  on  (FrF)1/2.  It  is  a 
natural  approximation  to  assume  that  they  are  quadratic  functions  of  (FrF)1/2.  If  we  sup¬ 
pose  for  simplicity  that  the  stress-free  strains  and  elastic  moduli  are  independent  of  T,  this 
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leads  to 

wUF)  =»  <a 0[(FrF)1/2  -  /]  ,  ( FtF )m  -/>  +  <pa(T) 
for  the  austenite,  and 

WHF)  =  <ai[(FTF)1/2  -  Ai],  ( FTF)m  -  A,  >  +  Qm(T) 

for  the  martensite.  Here  <Pa(T)  and  <1 >m(T)  are  the  energies  of  the  stress-free  states  as  a 
function  of  temperature;  a j(j  =  0,..,3)  is  a  symmetric  linear  map  acting  on  symmetric  ten¬ 
sors;  and  <A,B  >  =  Tr(AB)  is  the  standard  inner  product  of  symmetric  tensors.  We  shall 
see  presently  that  a,  is  precisely  the  Hooke’s  law  of  the  jth  phase.  The  symmetry  relations 
(2.5)  imply  that  a 2  and  <*3  arise  from  a!  through  the  action  of  Ru  and  respectively, 
acting  in  the  usual  way  on  symmetric  tensors;  for  example, 

<  <*1  >  ■  <  <*2€,§  > 

for  every  symmetric  tensor 

Now  let  us  linearize  Wf.  This  is  done  by  taking 

F  38  /  +  e/  ,  Ai  ■»  /  +  etfj, 

*m(T)  -  «2<j >W(D  ,  $>a(7)  »  e2<J>a(7), 

and  expanding  to  principal  order  in  e.  Since  ( FTF)m  =*  /  +  y(/ +  /r)  +  0(e2),  one 
easily  obtains 

W^(F)  -  «2|  <  ao(l±£.)  ,  >  +  <(,a(r)|  +  0(e3) 

WHF)  =  €2|  <  a -  a,)  ,  -  a,  >  +  cj»m(r)|  +  0(«3). 

Writing  §  *  “(/  +  /r)  for  the  linear  strain  tensor,  we  see  that  the  linearization  of  (2.4)  is 

Wr(|)  -  ,  j  *  0,1,2,3  j-  (2.6) 

with 

W?(§)  =»  <ao«,§  >  +  <f>a(T) 
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w‘Ttt)  =  <  a,(§  -  at),  $  -  at  >  +  +m(T),i  =  1,2,3. 

Thus  the  graph  of  the  linearized  energy  WT  is  the  minimum  of  a  family  of  paraboloids, 
each  having  its  vertex  at  a  different  linear  strain. 

The  preceding  argument  is  easily  extended  to  allow  the  transformation  strains  and 
Hooke’s  law  to  depend  on  temperature,  and  a  similar  analysis  can  clearly  be  done  for  other 
types  of  phase  transitions.  Notice  that  the  linearization  process  requires  A,-  —  /  as  well  as 
F  —  I  to  be  small;  thus  it  is  only  reasonable  to  use  a  linearized  theory  when  the  lattice 
parameters  of  the  phases  are  close  to  one  another. 

The  linearization  just  performed  is  of  course  only  formal.  It  is  not  at  all  clear  that  the 
original  elastic  energy  (2.1)  behaves  like  its  linearized  analogue  (2.6).  However,  the 
known  results  are  indicative  of  a  very  strong  connection.  For  example,  it  is  conjectured  in 
the  nonlinear  context  that  if  a  Young-measure  limit  of  gradients  is  supported  on  two  wells, 
then  those  wells  must  be  rank-one  related  [30].  We  shall  prove  a  very  similar  result  for 
the  linearized  setting  in  Section  4  (see  Proposition  4.4). 

3.  Relaxation  of  multiple-well  energy  functions 

This  section  explains  the  idea  of  relaxation  in  the  context  of  linear  elasticity;  the  cen¬ 
tral  notion  is  the  quasiconvexification  QW  associated  to  an  energy  density  W.  For  ener¬ 
gies  of  the  form  W  —  min  {W1,  .  .  .  ,WN}  we  introduce  the  related  notion  of  the  quasicon¬ 
vexification  at  fixed  volume  fractions,  QqW. 

To  explain  why  relaxation  is  of  interest,  consider  a  system  of  two  linearly  elastic 
phases  with  Hooke’s  laws  a,-  and  stress-free  strains  a{  ,  i  =  1,2: 

Wr(€)  =  min{Wkt)  .  Wfa)}, 

wk&  -  <«,(€  -<«,),€-«,>  +  MT).  (3.1) 

Here  §  is  the  linear  elastic  strain,  T  is  temperature,  and  4>,(T)  is  the  minimum  energy  of 
the  ith  phase  at  temperature  T.  We  assume  that  the  phases  exchange  stability  at  T  *=  Tc, 
say  <f>i  <  4>2  for  T  <TC  and  <J>i  >  $2  for  T  >  Te. 

Suppose  that  such  a  system  is  held  in  a  variable  temperature  field  T  =  T(x),  with  no 
body  loads  or  surface  tractions.  The  elastic  energy  is  then 


286 


(3.2) 


E[u]  =  /n  WT(X)  ( e(u))dx , 


1 


with  e(u )  *  y(Vi<  +  Vuf).  At  first  glance  the  minimization  of  E  might  seem  trivial.- 

phase  1  is  preferred  for  T  <Te  and  phase  2  for  T  >  Te,  so  it  is  tempting  to  look  for  a 
solution  of  the  form 


e(u) 


(at  T  <Te 
l«2  T  >  7V 


This  does  not  work,  however:  for  such  a  deformation  to  exist,  a\  —  a 2  must  here  the  form 
n  ®  m  +  m  ®  n,  and  the  surface  {T  *  Te}  must  consist  of  hyperplanes  normal  to  either  n 
or  m.  A  second  idea  would  be  to  consider  the  optimality  conditions  for  (3.2);  but  this,  too, 
is  ill-conceived,  since  it  is  based  on  the  assumption  that  a  solution  exists.  In  fact,  it  is  not 
clear  that  the  minimum  of  (3.2)  is  achieved;  rather,  it  is  possible  for  a  minimizing  sequence 
to  develop  oscillatory  spatial  gradients.  Physically,  this  arises  because  a  fine-scale  mixture 
of  both  phases  may  lead  to  a  lower  energy  than  either  pure  phase.  (We  view  the  set  where 
W\(e(u))  <  W$(e(u))  as  being  occupied  by  phase  1,  and  its  complement  by  phase  2.) 
Indeed,  £[u]  is  most  interesting  when  its  minimum  is  not  achieved:  that  is  the  case  when 
energy  minimization  requires  a  mixture  of  the  two  phases.  The  minimizing  sequences  for 
£[u]  determine  the  preferred  microstructures  for  phase  mixtures. 

The  technique  of  relaxation  is,  in  essence,  a  method  for  constructing  minimizing 
sequences  of  nonconvex  variational  problems  such  as  (3.2).  The  relaxed  problem  has  a 
similar  form,  with  Wj  replaced  by  its  "quasiconvexification”  QWj: 

4  QWm  (e(u))dx.  (3.3) 

Though  the  relaxed  problem  need  not  be  convex,  its  minimum  i&  achieved;  indeed,  its 
minimizers  are  precisely  the  weak  limits  of  minimizing  sequences  of  the  original  problem. 
The  relaxed  integrand  is  defined,  for  each  T,  by 


QWr<§)  -  ,  inf  -±rr  f  WH*(v))4x. 
v\iU~t-X  \U\  u 


(3.4) 


We  need  not  specify  the  domain  U,  since  the  value  of  the  infimum  is  the  same  for  all 
domains  with  reasonably  regular  boundaries  [3,9,10]. 
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The  introduction  of  the  relaxed  energy  is  physically  quite  natural.  We  think  of  WT  as 
the  "microscopic"  energy  function,  and  view  QWt  as  an  associated  "macroscopic"  energy:  it 
gives,  in  essence,  the  minimum  average  energy  when  the  average  strain  is  g.  Given 
knowledge  of  the  minimizing  sequences  for  (3.4)  and  a  minimizer  of  (3.3),  one  easily  con¬ 
structs  a  minimizing  sequence  for  the  original  energy  E:  this  is  done,  roughly  speaking,  by 
superimposing  the  oscillations  prescribed  by  (3.4)  upon  the  slowly-varying  strain  of  the 
solution  to  (3.3).  In  particular,  the  preferred  phase  micro  structures  associated  to  a  given 
strain  §  at  temperature  T  are  determined  by  minimizing  sequences  for  (3.4).  We  refer  to 
Section  2  of  [40]  for  an  expository  discussion  of  the  basic  facts  about  relaxation,  and  to 
[1,3,9,10]  for  more  comprehensive  treatments  including  proofs.  (These  references  discuss 

functions  of  Vu  rather  than  functions  of  e(u)  =»  -^-(Vu  +  Vu t).  But  they  require  no  coer- 

civity  hypotheses,  so  the  results  apply  a  priori  in  the  context  of  linear  elasticity.) 

The  definition  of  QWt ,  (3.4),  applies  generally,  whatever  the  form  of  the  energy  WT. 
From  Section  2,  however,  we  see  that  for  modelling  phase  transitions  it  is  natural  to  con¬ 
sider  energies  of  the  special  form 

W(g)  -mi/KwHg) . W"m.  (3.5) 

(We  suppress  the  parameter  T  for  simplicity  of  notation;  the  particular  form  of  W  will  not 
matter  for  what  follows).  For  such  W,  we  now  define  the  quasiconvexification  with  fixed 
volume  fractions,  Q*W. 

Let  V  denote  the  set  of  all  possible  volume  fractions: 

. 9n):  8,  SO,  £0,  -  1}. 

Fixing  a  region  U  of  R3,  we  say  a  partition  U  =  UiU  ■  ■  ■  U  Us  has  volume  fraction  8  if 
\Uj  \  ~  1  s  j  s  N.  It  is  convenient  to  represent  such  a  partition  by  its  "marker 

functions" 

x  €  17,- 
otherwise  ’ 

,  ls/sjv. 


note  that  X/X;  ■  8/y,  2x;  -  1,  and 

ik  l X/  “  ** 
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For  9  €  V,  we  set 


Q,Wli)  “  ti$  .!»%,  W  f«  ZxjW'MM*.  (3-«) 

where  (X;}f-  l  ranges  over  marker  functions  associated  to  partitions  of  U  with  volume 
fraction  9.  As  was  the  case  for  (3.4),  we  need  not  specify  the  domain  U:  the  value  of  the 
infimum  is  independent  of  U.  (The  proof  is  parallel  to  that  for  QW,  see  for  example  Pro¬ 
position  2.3  of  [3].) 

The  following  proposition  asserts  that  if  Q 9W  is  known  for  every  9€  V,  then  the  deter¬ 
mination  of  QW  requires  only  a  finite-dimensional  optimization. 

Proposition  3.1: 

QW{®  -  inf  Q%W(&. 

9  tV 

Proof:  Clearly 

it  a**® "  g  w  £ XiWHe(v)h  (37) 

where  (x;}  now  range  over  the  marker  functions  associated  to  all  partitions,  regardless  of 
volume  fraction.  The  right  side  of  (3.7)  is  not  altered  if  we  intercharge  the  order  of  the 
two  minimizations.  But  for  fixed  v, 

with  W  given  by  (3.5):  the  optimal  partition  has 

Xj  *  1  where  Wj(e(v))  -  t  mw  ^‘(efv))}. 

Thus 

Jnf  Q9W($  -  inf  -r jrrJW(e(v))  -  QW(Q. 

9€  V  v\aV  -  ?•*  \U  t  u 


We  require  one  more  lemma  concerning  the  general  notion  of  quasiconvexification. 
The  integrands  QW  and  Q$W  were  defined  above  using  the  Dirichlet  boundary  condition 
VU(/  *  f*  on  an  arbitrary  domain  U  (see  (3.4)  and  (3.6)).  However  there  is  an 
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equivalent  characterization  involving  the  averaging  of  periodic  functions.  This  will  be  con¬ 
venient  in  Section  4,  where  we  will  use  Fourier  analysis  to  calculate  Q$W  for  certain  two- 

well  energies.  We  choose  C  *  [0,2ir]  the  unit  cell  in  R3;f  F  denotes  the  average 
value  of  a  C-periodic  function  f. 

Lemma  3.2:  The  quasiconvexification  has  the  alternate  characterization 

<2W(€)  -  inf  fw(S  +  *(+))dx,  (3.8) 

per  " 

in  which  <j>  ranges  over  all  C-  periodic  maps  from  R3  to  R3 .  Similarly, 

GeW(€)  -  inf  inf  f  +  «(*))<&,  (3.9) 

r  4  per  J  " 

where  {xy}  range  over  periodic  marker  functions  associated  with  partitions  of  C,  and  <t> 
ranges  over  C-  periodic  maps  from  R3  to  R3 . 

Proof  (sketch):  We  assume  that  W  is  continuous  with  Lp  growth.  It  is  easy  to  see  that 

(3.8)  (3.4):  we  may  take  U  *  C  in  (3.4)  and  write  v  »  §  +  <j>  with  d>  lac  "  0,  then 
extend  <j>  periodicially  to  get  an  admissible  test  field  for  (3.8).  Conversely,  if  <j>  is  any 

periodic  test  field,  then  <|>v(*)  =  is  again  periodic  for  any  integer  N,  and  it  gives 

the  same  value  as  <j>  when  substituted  into  (3.8).  If  N  is  large  then  we  can  modify  <!>#  on  a 
thin  transition  layer  to  make  it  vanish  at  dC,  while  leaving  its  energy  (the  right  side  of 

(3.8) )  virtually  unchanged;  therefore  (3.4)  si  (3.8).  The  argument  that  (3.6)  =  (3.9)  is 
essentially  the  same. 


A  complete  proof  of  Lemma  3.2  will  be  found  in  [36].  The  fact  that  periodic  test 
fields  are  sufficient  to  test  for  quasiconvexity  was  previously  noted  in  [3]. 
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4.  Calculation  of  Q%W  for  two  linear  phases  with  the  same  elastic  moduli. 

In  this  section  we  present  the  calculation  of  Q*W,  when  W  is  the  energy  associated  to 
a  system  of  two  linear  phases  with  distinct  stress-free  strains  a  1,02  but  the  same  tensor  of 
elastic  moduli  a: 

W(&  - 

W2(§)  =  <  a a/  >  i  =  1.2.  (4.1) 


Our  method  is  basically  the  same  as  used  by  Khachaturyan  in  [31,32];  however  we 
emphasize  the  role  of  certain  projection  operators  rather  than  that  of  the  Green’s  function 
associated  to  a. 


We  begin  with  some  notation.  Let  S  be  the  6-dimensional  space  of  symmetric  tensors. 
For  any  k  €R3,  let 

V(k)  »{*®v  +  v®*:v€  R3},  (4.2) 

which  is  a  3-dimensional  subspace  of  S.  For  any  subspace  V  of  S  we  write  irvt  for  the 
orthogonal  projection  of  £  onto  V. 

Theorem  4.1:  Let  W  be  given  by  (4.1),  and  let  0  =*  (0i,02)  €  V.  Then 

QeW(§)  -  OiWH©  +  -  M2£  (4.3) 


with 


g  =  max  \TravzV(„)  a1/2(ai  -  a 2)|2. 
I*  I  •  * 


(4.4) 


Whenever  n*  is  extremal  for  (4.3),  a  laminar  microstructure  with  n*  as  the  layer  normal 
gives  an  optimal  phase  arrangement. 


Proof:  In  view  of  Lemma  3.2,  we  must  minimize 

/[X  lWHt  +  «<*))  +  X2^2(§  +  e(*))]dx  (4.5) 

over  periodic  markeT  functions  xi»X2  "  1  "  Xi  and  over  periodic  deformation  fields  <j>. 
Elementary  manipulation  using  (4.1)  transforms  the  integrand  of  (4.5)  into 

Xl^C©  +  X2 W2(€)  +  <a*(<J>),*(<i>)  > 
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+  2  <  €,ae(<j>)  >  -  2  <  Xl<* i  +  X2*2.a  *(<!>)  >  • 

Since  fxt  “  the  integrals  of  the  first  two  terms  are  determined: 

/  xi wHt)  +  X2 w2(5)  -  +  e2w2(§). 

Since  <|>  is  periodic  and  Xi  +  X2  “  1  we  can  integrate  by  parts  in  the  last  two  terms  to  get 

2  f  <  §,«*«(<!>)  >  -  <  xi«i  +  X2<22. «(<(>)  >  =  -2 f  Xi  <  *1  “  *2, <*«($)  >  • 
Thus  to  prove  (4.3)  we  must  establish  that 

inf  inf  f  <  ae(<|») ,«($)  >  -  2  <  xi(*i  “  a2),ae(<fc)  >  -  "Wig,  (4.6) 
* 

with  g  defined  by  (4.4). 

Fixing  x»  we  shall  compute  the  minimum  over  <j>  in  (4.6)  using  Fourier  analysis.  Since 
our  functions  are  periodic  with  period  2ir  in  each  variable,  their  Fourier  transforms  are 
supported  on  the  lattice  of  integers  Z3  ,  e.g. 

XiC*)  *  2  &(*)«*’*• 

kiZ3 


The  integral  in  (4.6)  can  be  rewritten  as 


f  \aV2e(4>)\2  -  2xi  <  a1/2(ai  -  a2)  ,  a1/2<?(4>)  >  , 

where  a1/2  is  the  square  root  of  a,  itself  a  positive  definite  symmetric  map  on  the  space  S 
of  symmetric  tensors.  By  Plancherel’s  formula,  this  is  equal  to 

2  |a1/2e($)  |2  -  2  Re  <  a1/2(ai  -  a2)  xi  ,  a1/2  <?(<j>)  >  ,  (4.7) 

kiZ3 

in  which  xi  *  Xi(^).  etc.,  and  <•,•  >  is  the  symmetric  inner  product  on  complex 

A 

matrices.  Choosing  <j>  to  minimize  (4.6)  is  the  same  as  choosing  4>  to  minimize  (4.7), 
which  may  be  done  separately  at  each  k.  Frequency  0  is  special:  it  contributes  nothing  to 

A 

(4.7),  since  e($)(0)  -  0  for  any  periodic  <j>.  When  k  #0,  the  optimal  value  of 
H  *  a1/2e(<j>)(fc)  is  obtained  by  minimizing 

hi2  -  2  Re  <  T\  Xi(k)  ,  a1/2(ai  -  a2)  >  (4.8) 
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over  the  space  of  all  possible  values  of  a ll2e(<b)(k),  which  is  the  complexification  of 
a1,2V(k).  The  necessary  linear  algebra  is  presented  as  Lemma  4.2  below;  the  optimal  n 
turns  out  to  be 

V  ■  Xi(*)  OLm(ai  -  a2 ), 

and  substitution  into  (4.8)  gives  the  value 

“  Ixi(*)  I2  k«wy(*)  a1/2(ui  -  a2 )  I2. 

Thus  for  given  xi> 

W (4.5)  -  -  2  lxi(*)  I2  k«iflv(*)  a1/2(ai  -  u2)|2- 

♦  k*  0 

Next  we  must  minimize  this  expression  over  xi-  Since  the  subspace  V(k)  depends 
only  on  kl  \k  |,  it  is  immediate  from  (4.4)  that 

|ir«i/2v(*)  a1/2(ai  -  a2)  \2  s  g, 

with  equality  if  and  only  if  n  -  kl  |/fc  |  is  extremal  for  (4.4).  Since  xi  *  ®i,  another 
application  of  Plancherel’s  formula  gives 

2  lxi(*)  I2  “  f(X l  ~  ®i)2  =  ®i®2 • 

*  #  o 

It  follows  that 

“  2  lxi(*)  I2  kawV( k)  a1/2(ai  ~  a2)  |2  2=  -0i®2^.  (4-9) 

k+ 0 

with  equality  if  kl  \k  |  is  extremal  for  (4.4)  whenever  \{k)  *  0. 

To  complete  the  proof  of  (4.6),  we  must  show  that  equality  can  be  approached  in 
(4.9).  To  this  end,  note  that  a  layered  geometry  has  its  Fourier  transform  concentrated  on 
the  line  parallel  to  the  layer  normal;  in  other  words,  if  xi(*)  *  f(x-k)  with  f  periodic,  then 
Xi  is  supported  on  the  line  through  k.  For  k  i  Z3  this  xi  is  C-periodic  provided  that  f  has 

period  2ir,  and  f  xi  *  ®i  provided  that  £fm  ®i-  As  kt\k  \  approaches  an  extremal  for 
(4.4),  this  layered  phase  arrangement  establishes  the  optimality  of  the  lower  bound  (4.9), 
completing  the  proof. 
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In  minimizing  (4.8),  we  made  use  of  the  following  lemma  from  linear  algebra: 

Lemma  4.2.  Let  V  be  a  subspace  of  a  finite  dimensional  real  vector  space  S,  and  let 
Vq  C  Sc  be  their  complexiflcations.  For  any  §  €  S  and  any  complex  number  c, 

min  |i||2  -  2Re  <  ci|.€  >  -  -  k  l2kv€l2.  (4.10) 

*€VC 

The  extremal  n  being  V  *  c  iry  g. 

Proof:  The  function  f(r\)  »  (n  |2  —  2  Re  <rq ,§  >  is  convex,  and 

The  proposed  extremal  V  belongs  to  Vc .  and  the  directional  derivative  of  f  at  V  vanishes 
in  directions  t|  €  Vc: 

D-i  /(V)  *  2  Vj  +  “tf"  j 

*  <  C(lTv§  -  $),T|  >  +  <  c(iry£  -  g),T|  > 

»  <  irv€  ”  §,  Re  (ct|)  >  *  0 

since  Re(cTi)  €V.  It  follows  that  the  minimum  of  f  on  Vq  is  achieved  at  V,  and  substitu¬ 
tion  yields  (4.10). 


Theorem  4.1  gives  a  formula  for  Q*W  in  terms  of  a  single  real  constant  g,  which  must 
be  determined  through  an  optimization  over  S2  (see  (4.4)).  If  a\  -  ai  is  dyadic,  i.e.  if 

a\- <*2  =  p®q  +  q®p  (4.11) 

for  some  p,q  €  R3,  then  clearly  g  «  |a1/2(aj  -  ai)  |2,  and  the  extremals  for  (4.3)  are 
n  *  ±  pi  \p  |  and  n  ■  ±  ql  \q  |.  This  is  the  case  when  the  two  phases  can  coexist  in  their 
stress-free  states,  separated  by  hyperplanes  orthogonal  to  p  or  q.  If  (4.11)  does  not  hold 
then  g  <  |a1/2(ui  —  ai)  I2,  and  the  optimization  (4.4)  is  more  subtle.  It  has  been 
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examined,  and  the  extremal  choices  of  n  have  been  identified,  under  various  symmetry 
hypotheses  on  a  [26,41,48,55,60,61];  the  most  complete  such  study  is  [41]. 

We  have  observed  that  there  is  always  an  extremal  geometry  which  is  layered.  The 
extremal  geometry  is  generally  not  unique,  however.  The  proof  of  Theorem  4.1  shows 
that  a  geometry  is  optimal  exactly  if  xi  is  concentrated  on  the  extremals  of  (4.4).  Similarly, 
a  sequence  of  microstructures  is  asymptotically  optimal  if  the  associated  marker  functions 
Xi  »  xi  *  1  “  xi  "have  their  Fourier  transforms  concentrated  at  the  extremals  of  (4.4), 
asymptotically  as  j  -  «.  "  This  is  best  made  precise  by  using  the  notion  of  an  H-measure, 
recently  introduced  and  explored  in  a  much  more  general  setting  by  Tartar  [58]  and  Gerard 
[24].  In  our  periodic  setting,  the  H-measure  associated  to  a  periodic  marker  function  xi  is 
the  measure  p.  on  5 2  defined  by 

2  lxi(*)l2&*/|*l,  (4.12) 

*  *  o 

where  k  ranges  over  Z3  and  8„  is  the  Dirac  measure  concentrated  at  n.  With  this  terminol¬ 
ogy,  we  can  assert: 

Proposition  4.3.  Consider  a  sequence  of  microstructures  corresponding  to  marker  func¬ 
tions  xi  »  xi  =  1  ~  xiJ  5:3  1.2,..,  and  let  p.y  be  the  H-measure  associated  to  xi  as  hi 
(4.12).  Then  the  sequence  (xi)  is  asymptotically  optimal  for  (4.5)  if  and  only  if  every 
weak  limit  of  the  sequence  {p.;}  Is  supported  on  the  set  of  extremals  for  (4.4). 

The  proof  is  an  easy  extension  of  that  given  for  Theorem  4.1,  so  it  is  omitted. 

Remark  4.4.  In  the  homogenization  literature,  many  composites  with  extremal  characteris¬ 
tics  have  been  found  in  the  class  of  "sequentially  laminated”  microstructures,  see  e.g. 
[2,23,38,46,49,50,57].  The  relevance  of  such  microstructures  to  coherent  phase  mixtures 
was  noted  long  ago  by  Roitburd,  who  called  them  "polydomain  structures  of  second  or 
higher  order"  [51-54].  Using  this  construction,  one  can  construct  a  large  variety  of  optimal 
phase  microgeometries  if  (4.4)  has  at  least  two  linearly  independent  extremals.  See  [36] 
for  details. 


The  relaxation  of  (4.1)  is  entirely  determined  by  Proposition  3.1  and  Theorem  4.1. 
We  refer  to  [36]  for  a  detailed  discussion  of  the  properties  of  QW.  As  an  application  of 
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Theorem  4.1,  however,  let  us  show  here  that  QW  still  has  double-well  structure  if  the 
stress-free  strains  ai  and  a  2  are  incompatible. 

Proposition  4.1.  Let  W  be  given  by  (4.1),  and  assume  that  a\-  a2  does  not  have  the 
form  p®q  +  q®p.  Then  QW(a\)  ■  QW(a2 )  **  0  and 

QW(g)  >  0  for  g  *  aua2.  (4.13) 

Proof:  As  a  first  step,  we  claim  that  the  formula  for  Q&W  can  be  expressed  as 

-  |a1/2[g  -  a(9)]  I2  +  W2h  (4.14) 

with 

a(0)  *  0iai  +  02fl2 
h  -  |a1/2(ai  -  a2)  |2  -  g. 

Indeed,  the  equivalence  of  (4.3)  and  (4.14)  is  a  matter  of  elementary  manipulation,  making 
use  of  the  identity 

|a1/2(?  -  a)  |2  -  0!  |aly2(§  -  a2)  |2  +  02  |a1/2(§  -  a2)  |2  -  0i02  |a1/2(a:  -  a2)  |2. 

Notice  that  h  >  0  as  a  consequence  of  our  hypothesis  on  a\  —  a2. 

Next,  we  apply  Proposition  3.1  to  obtain  that 

QW(&  -  /nin{|a1/2[§  -  a(0)]  |2  +  (4.15) 

8 

The  minimum  in  (4.15)  is  achieved,  since  0  =  '(81,82)  varies  over  the  compact  set 

(01  2:  0,  02  ^  0,  0i  +  02  -  1}. 

Now  the  assertions  of  the  theorem  follow  easily.  It  is  obvious  that  QW  a  0.  If,  for 
some  <j,  QW(£)  =  0,  then  the  optimal  0  for  (4.15)  would  give 

|a1/2[d  -  a(0)]|2  +  0i02A  -  0. 

Both  terms  are  non  negative,  so  they  must  both  vanish.  We  conclude  that  §  3  a(8)  and 
0102  *  0.  Therefore  either  8  =  (1,0),  (j  »  ai,  or  else  0  =  (0,1),  §  =  a2.  Thus  QW  is 
strictly  positive  except  at  a\  and  <22- 


5.  Toward  the  calculation  of  Q*W  for  a  system  of  many  phases,  all  with  the  same  elastic 
moduli. 

It  is  natural  to  ask  whether  the  method  of  Section  4  might  be  applied  to  an  energy 
function  describing  more  than  two  phases.  The  answer  at  present  is  no.  One  can  certainly 
begin  the  same  way,  but  it  is  not  clear  how  to  minimize  the  resulting  expression  in  Fourier 
space  over  the  class  of  all  micro  structures.  The  calculation  given  here  is  equivalent  to  that 
of  Khachaturyan  and  Shatalov  in  [32,33].  We  present  it  to  clarify  the  relation  between  the 
linear  and  nonlinear  variational  theories,  and  to  focus  attention  on  this  as  a  significant  open 
problem. 

We  consider  a  system  of  N  phases,  with  stress-free  strains  i  and  the  same  elas¬ 
tic  law  a: 

W(©  -  min{wH$,  •  *  •  ,W*(§)} 

W<(©  -  «*($  -  af),  f  -  *i  >  1  (5.1) 

Our  starting  point  is  once  again  Lemma  3.2:  for  any  vector  &  =*  (0i,...,9y)  of  volume  frac¬ 
tions, 

QeW(©  -  inff  2  XjWJ(S  +  «(*))  dx.  (5.2) 

j  m  l 

Here  4>  ranges  over  periodic  deformations,  and  (x;}  are  the  periodic  marker  functions  asso¬ 
ciated  to  any  subdivision  of  the  unit  cell  with  the  specified  volume  fractions.  Expanding 
the  integrand  of  (5.2)  gives 

2  Xj  WJ(Z)  +  2  2Xj  <  §  “  <*jt  ae(<f>)  >  +  <  a  e(<J>),  *(<$>)  >  , 

since  2xy  “  L  The  average  of  this  expression  is 

2  9J  wJ(Q  +  fl  <  <*  «(4>).«(40  >  -22  X;  <  aj  ,  ae(<j>)  >  ], 

since  the  part  of  the  middle  term  involving  §  integrates  by  parts  to  zero.  Thus  the  essential 
problem  in  calculating  Q^W  is  the  evaluation  of 

inf  inf  £[<ae(b),  e(<J>)  >-  X;<  aj>  ou?(<J>)  >  ]dx.  (5.3) 

f  ♦ 

j  Xj  ■  9) 

As  before,  we  may  evaluate  the  minimum  over  <f>  by  Fourier  analysis:  fixing  (x;}.  we  seek 
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(5.4) 


inff  |a1/2*(<f>)  j2  -  2<a1/22  Xj<*j,  a1/2e(4>)  >  dx. 

9 

Taking  the  Fourier  transform  then  minimizing  at  each  frequency  as  in  (4.6)-(4.7)  shows 
that  (5.4)  equals 

-  2  (a1/2  2  X;(*)  ai)  I2  (5.5) 

k  #0 

This  can  be  written  in  terms  of  H-measures  as  follows:  for  1  s  i,  j  s  N,  let  p.  =  (p,y)  be 
the  Hermitian  matrix  valued  measure  on  S2  with  components 

M-y  ■  2  Xi(k)Xj(k)  S*,|*|. 

k  *  0 

Then  (5.5)  can  be  rearranged  to  give 

“2  /  <  ^wv(t)  a1/2af.  waw2v(*)  a1/2a;  >  d\Lij.  (5.6) 

i,j  *  1  S2 

Thus  calculating  Q«W  is  equivalent  to  minimizing  (5.5)  over  ail  marker  functions  {x;}  with 
the  specified  volume  fractions;  alternatively,  it  is  equivalent  to  minimizing  a  certain  linear 
functional  over  the  class  of  all  H-measures  associated  to  such  marker  functions.  For  the 
case  of  two  phases  (N=a'2)  this  is  precisely  what  we  did  in  Section  4;  for  three  or  more 
phases  we  are  presently  unable  to  give  a  formula  (or  a  finite-dimensional  optimization)  for 
(5.5M5.6). 

There  is  one  special  case  when  the  calculation  can  be  completed:  that  is  when  a  is  iso¬ 
tropic  and  each  aj  is  a  multiple  of  the  identity,  say  aj  =  X.,7.  It  is  well-known  that  under 
these  circumstances  the  elastic  energy  is  independent  of  phase  geometry.  To  recover  this 
result,  we  observe  that  (5.5)  becomes 

-  2  2  h  x<(«  I  («1/2o  I2.  (5.7) 

*  *0  ij 

Since  a  is  isotropic  wai/2y(*)(a1/2/)  cannot  depend  on  k,  and 

2  2  h  Xi(k)  *J(I)  -  f  !2Mx/  -  *»)  I2. 

k  *0  i,j 

which  obviously  depends  only  on  0  *  (0i,  .  .  .  ,0y).  Thus  (5.5)  depends  only  on  0,  not 
on  the  phase  geometry. 
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Even  when  Q*W  cannot  be  computed  explicitly,  it  still  makes  sense  to  ask  where 
Q$W  **  0.  The  point  is  that  —  0  for  all  (j  (since  W(§)  a  0);  so  one  can  establish 

that  2e^(€e)  33  0  for  some  (j$  by  displaying  a  microstructure  which  achieves  this.  In 
other  words,  to  show  that  Qe W(€e)  53  0  it  suffices  to  prescribe  test  fields  4>  and  {x7}  for 
use  in  (5.2)  which  have  2xy  W;(§9  +  *($))  ~  0*  The  class  of  sequentially  laminated 
microstructures  provides  a  powerful  tool  for  such  constructions;  it  has  been  used  in  [32,33] 
(in  the  linearized  setting)  and  in  [4,5]  (in  the  geometrically  nonlinear  context)  to  show  that 
W(§e)  =  0  for  certain  §«,  when  W  models  a  cubic-tetragonal  phase  transition  such  as 
that  discussed  in  Section  2. 

An  intriguing  open  question  is  whether  the  extremum  of  (5.7)  can  always  be  found 
within  the  class  of  sequentially  layered  microstructures.  We  hope  that  the  answer  is  affir¬ 
mative.  Such  a  result  seems,  however,  beyond  the  power  of  the  existing  mathematical 
methods. 
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Abstract.  When  cooled  below  a  certain  critical  temperature  0C,  many  crystals  undergo  a 
structural  phase  transformation  marked  by  the  appearance  of  microstructure.  In  the  simplest  case, 
this  microstructure  consists  of  fine  parallel  bands,  ranging  in  thickness  from  a  few  atomic  spacings 
in  NiMn  (termed  microtwinning  by  Baele,  van  Tendeloo  and  Amelinckx  [2]),  to  a  few  microns  in 
InTl.  In  some  of  these  materials  the  application  of  limited  displacements  to  the  boundary  of  the 
crystal  causes  a  rearrangement  of  the  microstructure.  In  this  talk  I  describe  recent  attempts  to 
understand  why  such  microstructures  form  and  how  imposed  deformations  are  accommodated  by 
rearrangements  of  the  microstructure.  The  ideas  suggest  a  new  approach  to  micromagnetics 
described  in  Section  5. 

1 .  Introduction 

The  great  usefulness  of  the  classical  field  theories  of  elasticity,  hydrodynamics, 
thermodynamics  and  electromagnetism  arises  from  their  ability  to  accurately  predict,  from  a 
knowledge  only  of  boundary  and  initial  conditions  and  a  few  material  parameters,  the  complex 
fields  of  deformation,  stress,  temperature  and  electromagnetic  fields  in  a  deforming  material.  On 
the  whole,  these  theories  have  failed  to  make  similar  predictions  about  materials  containing 
domains  or  defects.  Alternatively,  the  historical  practice  has  been  to  make  rather  restrictive 
assumptions  about  the  geometry  and  arrangement  of  defects  and  then  to  calculate  something  about 
them  using  linear  theory.  This  has  led  to  a  hodgepodge  of  special  theories  of  defects  having  the 
inherent  limitation  that  they  are  unable  to  deal  with  any  situation  not  envisaged  by  the  severe 
geometric  restrictions  assumed  at  the  outset.  In  particular,  they  are  generally  unable  to  explore 
conditions  that  might  give  rise  to  new  and  unusual  microstructures  important  to  the  development  of 
advanced  materials.  They  are  also  extremely  limited  in  coping  with  dynamic  situations. 

The  point  of  view  adopted  in  the  research  described  here  is  that  the  domain  structure  itself 
should  be  predicted  from  some  equations  without  a  priori  geometric  restrictions.  This  point  of 
view  is  not  new  and  was  expressed  nicely  by  L.M.  Brown  [8,  9]  in  his  books  on  the  domain 
structure  of  magnetic  materials;  he  termed  this  approach  micromagnetics.  In  the  magnetic  case,  the 
competing  theory  involving  geometric  (and  other)  restrictions  is  called  domain  theory.  Similarly, 
in  the  area  of  martensitic  transformations  the  crystallographic  theory  of  martensite  has  served  the 
subject  well,  but  primarily  as  a  way  to  understand  only  a  rather  special  microstructure  among  many 
that  are  actually  observed.  A  new  theory,  in  some  ways  analogous  to  micromagnetics,  developed 
by  J.M.  Ball  and  the  author  ((3,  4],  see  also  Chipot  and  Kinderlehrer  [7]),  is  designed  to  avoid  the 
geometric  restrictions  adopted  by  the  crystallographic  theory  and  to  offer  possibilities  for  prediction 


*  Sponsored  by  the  U.S.  Army  Research  Office.  Section  5  of  this  paper  was  presented  at  the  "Elasticity  Retreat", 
South  Pomfret,  Vermont,  August  23,  1989. 
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of  complex  microstructures.  In  this  talk  I  describe  recent  predictions  of  this  theory  and  a  plan  for 
an  experimental  test  of  the  theory. 

These  recent  predictions  all  concern  unloaded  crystals.  In  a  companion  paper  in  this 
volume,  R.  Kohn  [13]  discusses  recent  results  on  the  microstructure  of  loaded  crystals  and  relates 
these  results  to  the  metallurgical  literature. 

I  return  to  micromagnetics  in  Section  5.  Despite  the  attractiveness  of  Brown’s  philosophy, 
his  calculations  met  with  limited  success.  The  reason  for  this  has  been  explored  recently  by  D. 
Kinderlehrer  and  the  author  [10]  and  seems  to  arise  from  the  fact  that  the  free  energy  he  adopts 
does  not  have  a  minimum  in  the  conventional  sense. 

2 .  Austenite/Martensite  Interface 

The  crystallographic  theory  of  martensite,  due  to  Wechsler,  Lieberman  and  Read  [19]  and 
Bowles  and  Mackenzie  [6],  is  a  theory  for  the  description  of  a  special  microstructure  known  as  the 
austenite/martensite  interface.  This  is  a  microstructure  pictured  in  Figure  1  consisting  of  fine  bands 
of  martensite  (twins)  on  one  side  of  the  interface  and  homogeneous  austenite  on  the  other  side  of 
the  interface.  It  predicts  the  normal  to  the  interface  (the  so-called  habit  plane),  the  proportion  of 
volume  occupied  by  one  twin  relative  to  the  other,  the  relative  orientation  of  the  twinned  martensite 
lattice  relative  to  the  austenite  lattice,  and  the  macroscopic  deformation  of  the  twinned  martensite 
relative  to  the  austenite.  By  “macroscopic  deformation”  is  meant  the  homogeneous  deformation 
which  takes  the  austenite  lattice  to  the  twinned  martensite  lattice,  neglecting  the  small  zig-zags 
produced  by  the  twins  themselves.  Figure  1  shows  a  picture  of  the  atoms  in  a  typical 
austenite/martensite  interface.  This  picture  was  generated  using  lattice  parameters  appropriate  to 
the  alloy  Ni62Al38,  and  only  Ni  atoms  arc  shown.  Recent  high  resolution  electron  micrographs  of 
Schryvers  [16]  show  many  of  the  features  of  Figure  1  including  accurate  atomic  periodicity  of  the 
twins. 


The  crystallographic  theory  of  martensite  is  based  on  the  following  assumptions.  A  certain 
twin  system  is  presumed  given;  this  means  that  orthogonal  vectors  a  and  n  are  given  where 

l+a®n  is  the  shear  that  maps  the  transformed  lattice  into  its  twin.  It  is  recognized  that  the 
martensite  (to  the  left  of  the  interface  in  Figure  1)  is  finely  twinned  so  that  its  macroscopic 
deformation  is  in  fact 

Px:=l+A.a®n  , 

where  Xe  (0,  1)  represents  the  volume  fraction  of  one  variant  of  martensite  relative  to  its  twin.  A 
positive-definite  symmetric  matrix  U  is  given  which  represents  the  pure  stretch,  or  Bain  strain, 
associated  with  the  transformation  from  austenite  to  untwinned  martensite.  The  average 

deformation  of  the  twinned  martensite  body  is  assumed  to  have  the  form  E=l+b®m,  m  being  the 
normal  to  the  austenite/martensite  interface.  The  basic  equation  of  the  crystallographic  theory  is 
then  written  [see,  e.g..  19] 

E  =  RUPx. ,  (2.1) 

or,  using  the  definitions  given  above, 

l  +  b®m  =  RU  (1+Xa®n) .  (2.2) 
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Figure  1.  Austenite-martensite  interface  drawn  with  lattice  parameters  appropriate  to 
Ni62Al38- 

Here  ReSO(3)  is  an  arbitrary  rotation  matrix.  In  applications  of  equation  (2.2),  the  matrix  U  and 

the  vectors  a  and  n  are  given  and  (2.2)  is  solved  for  be  IR3,  me  IR3,  Re  SO(3),  A.e  (0,  1).  An 
excellent  treatment  of  the  crystallographic  theory  is  given  in  the  original  paper  of  Wechsler, 
Lieberman  and  Read  [19]. 

An  alternative  view  of  this  microstructure  is  given  by  Ball  and  James  [3].  They  consider  a 
free  energy  density  0(Dy,  9)  where  y:£2— >IR  describes  the  deformation,  either  ordinary  elastic 
deformation  or  the  deformation  associated  with  transformation,  and  9  represents  the  temperature. 

Principles  of  frame-indifference  and  lattice  symmetry  are  used  to  restrict  the  form  of  0.  With 
symmetry  assumptions  appropriate  to  a  cubic-to-tetragonal  phase  transformation,  there  emerge 

three  scalar  functions  of  temperature,  a(9),  T)i(9)  and  t)2(9),  and  a  fixed  orthonormal  basis  { e; } 
such  that 
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for  9>0C  0(<x(9)  1 ,  6)  <  4>(F ,  0)  V  F  , 
fore^  <j>(Ui(e),  e)  s  <j>(F,  e)  v  f  , 

where  Ui(0):=*ni(0)l  +  ('H2(®) — "H  l(9))*i®*i  (no  sum). 


(2.3) 


Thus  <j)  has  “potential  wells”  and,  after  accounting  for  the  condition  of  frame-indifference 
(<|>(RF,  0)=<j>(F,  6)  VF  and  VR€SO(3))  we  are  led  to  adopt  the  terminology, 


a(6)SO(3) 


Austenite  well 


SO(3)Ui(e)  u  SO(3)U2(e)  u  SO(3)U3(6) . Martensite  wells 

Micxostructures  in  stable  equilibrium  at  the  temperature  0  are  minimizers  y(x)  of  the  total  energy 

Eety]  =  /  <)>(Dy(x),  0)dx  .  (2.4) 

Q 

This  formulation  does  not  involve  the  geometric  restrictions  mentioned  in  the  Introduction. 

The  energy  Eq[»]  fails  the  condition  of  weak  lower  semicontinuity  in  W1-°°(Q,  IR  3).  The 
effect  of  this  is  that  there  are  minimizing  sequences  yW  -^y  in  W1-”  with 

Ee[y]  >  inf  Ee  [yW] . 
k 

These  sequences  involve  finer  and  finer  oscillations  which  model  the  phenomenon  of 
micro  twinning  as  pictured  in  Figure  1. 

How  does  the  austenite/martensite  interface  emerge  as  a  minimizing  sequence?  There  is  a 
huge  variety  of  minimizing  sequences  to  the  problem,  just  as  there  is  a  huge  variety  of 
microstructures  observed  in  transformed  crystals.  To  discuss  the  austemte/martensite  interface  we 
gather  from  Figure  l  that  essentially  three  deformation  gradients  participate  in  this  microstructure. 
As  a  way  of  quantifying  this  restriction,  the  concept  of  a  Young  measure  is  useful.  The  basic 

theorem  on  Young  measures  states  that  to  every  sequence  yW  *^y(in  W1-00)  we  can  assign  a 
family  of  probability  measures  vx,  xe  Q,  such  that  for  every  continuous  function  f:M3x3-»  IR , 

f(Dy(k))  J  f(G)dvx(G)  . 

M3*3 

A  version  of  this  theorem  which  is  ideally  suited  to  these  problems  of  microstructure  is 
given  by  Bail  [5].  An  easy  consequence  of  this  theorem  is  that  (with  mild  growth  assumptions  on 

0)  for  any  minimizing  sequence  of  Eq(*],  the  support  of  the  Young  measure  lies  on  the  potential 

wells  of  <t>(*,  9).  This  support  may  be  thought  of  as  the  set  of  deformation  gradients  that 
“participate"  in  the  microstructure. 
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Hence,  the  austenite-martensite  interface  should  be  modeled  by  a  minimizing  sequence  yM 

A 

whose  Young  measure  vx  is  supported  on  three  matrices  F+,  F\  1  where  F+-F=a®n  for  some 

a€  IR3,  n€  IR3,  and  F+  and  F‘  belong  to  the  martensite  wells  (it  is  easily  calculated  that  the 
martensite  wells  have  such  rank-1  connections).  In  the  spirit  of  “no  geometric  restrictions,”  we 
prefer  to  say  nothing  about  how  the  sets  on  which  Dyk  takes  on  (approximately)  the  values  1,  F+ 
and  F*  are  arranged.  Under  the  condition  only  that  vx  is  supported  on  {1,  F+,  F* } ,  James  and 
Kinderlehrer  [11]  prove  that  there  is  Xe  (0, 1)  and  vectors  b  and  m  such  that 

A.F++(1-\)F-  =  1+  b<S>m 

But  (2.5)  is  precisely  the  equation  of  the  crystallographic  theory  of  martensite  when  we  recognize 
that 


X.F++(1-A.)F-  =  F*  +  Xa®n 

=  RUi(l+Xa®n) 

A 

where  i=l,  2  or  3  and  a=(RUi)-‘a.  It  is  found  that  this  vector  a  is  exactly  the  one  used  as  input  to 
the  crystallographic  theory.  Further  information  on  the  geometry  of  this  microstructure  (still 
obtained  from  the  same  assumption)  is  given  by  James  and  Kinderlehrer  [11],  and  this  information 
is  in  complete  agreement  with  Figure  1  when  the  calculation  is  specialized  to  the  measured  lattice 

parameters  tii(0c)  and  Ti2(9c)  appropriate  to  Ni62Al38. 

Note  that  in  the  analysis  above,  it  was  assumed  that  two  of  the  matrices  (F+  and  F*)  differ 
by  a  rank-1  matrix.  It  is  still  not  know  whether  there  exists  a  microstructure  for  three  matrices  with 
no  rank-1  connections,  but  an  example  of  James  and  Kohn  [12]  shows  that  there  exist 
microstructures  with  Young  measure  supported  on  four  matrices  having  no  rank- 1  connections. 

3.  The  Two-Well  Problem 

From  the  point  of  view  of  “no  geometric  restrictions,”  the  crystallographic  theory  of 
martensite  provides  only  a  weak  test  of  theory.  With  this  in  mind  Ball  and  James  [4]  consider  the 
problem  of  what  are  all  possible  macroscopic  deformations  that  can  be  realized  by  microstructures 
involving  just  two  variants.  Physically,  these  deformations  should  be  associated  with  flat  regions 
on  stress-strain  curves,  common  in  materials  that  undergo  reversible  martensitic  transformations 
and  thought  to  be  associated  with  rearrangements  of  the  variants.  Shape-memory  materials  are 
interesting  materials  which  easily  rearrange  their  variants  in  response  to  imposed  distortions,  and  it 
is  hoped  that  such  calculations  may  reveal  the  reason  for  this.  However,  presently  only  two 
variants  have  been  considered,  while  most  of  the  good  shape-memory  materials  have  six  or  twelve 
variants. 

The  simplest  version  of  the  problem  is  as  follows.  The  two  variants  are  defined  by  the  set 
of  matrices 

Tfl  =  SO(3)UiuSO(3)U2  .(3.1). 
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where  Ui=Uy>0  and  U2=U2>0  are  distinct  3x3  matrices.  We  assume  that  detU2=detUi  and  that 
there  is  an  Re  SO(3)  such  that 

RU2-U1  =  a®n  (3.2) 

These  assumptions  are  satisfied  in  the  typical  case  of  two  variants  of  the  less  symmetric  phase  (in 

A 

which  case  there  is  an  R  belonging  to  the  point  group  of  the  more  symmetric  phase  such  that 
RU2Rt=Ui). 

To  be  definite,  we  assume  that 
Ui  =TUl  +  Cn2-m)ei®ei , 

U2^nil  +  (Tt2-T|l)e2®e2.  (3.3) 

as  in  the  cubic-to-tetragonal  base,  although  the  analysis  of  [4]  operates  under  the  more  general 
assumptions  listed  above.  Consider  the  problem 

inf  J  <j>(Dy(x))dx  ,  (3.4) 

ye  W1-00  Q 

where  <|>  has  strict  potential-well  minima  on  TTi.  Under  mild  restriction  on  <p,  any  minimizing 

sequence  yOO  -^y  in  Wl'°°(Q  !R3)  has  the  property  that  its  Young  measure  vx  is  supported  on 
TTi.  Some  of  these  sequr-  r  will  also  satisfy  the  linear  boundary  conditions* 

y(k)(x)  =  Fx,  .  (3.5) 

The  two-well  problem  is  the  problem  of  finding  all  matrices  F  such  that  suppvx  <=  °W[  a.e..  That  is, 
what  are  all  possible  macroscopic  deformations  F  that  can  be  achieved  by  mixtures  of  the  two 
variants  SO(3)U  1  and  SO(3)U2  ? 

The  answer  to  this  question  is  as  follows.  Let 

ei  =  Ol*^  +  n  2  ei)  /  Cnf  +  ?!  2 ) 1/2  , 

£3  =(Tlie2-Ti2ei)/(Tlj  +rfyxn- ,  (3  6) 

AAA 

5  =  2  (T12_T1 1}  (T11  +  ^l)AI2  (T|'l2  +  T122)1/2- 

Then,  F  can  be  achieved  by  a  mixture  of  two  variants  if  and  only  if 


*  The  analysis  of  [4]  is  not  restricted  to  linear  boundary  conditions. 


310 


Relation  Between  Microscopic  and  Macroscopic  Properties  . . . 


R.  D.  James 


FtF  =  Ui  (1  +  Sei®e3)  C(1  +  5e30ei)  Ui  ,  (3.7) 

where  C=CT  satisfies 
det  C  =  1, 

C^=^2,  (3.8) 


and  Cn=*i*Cei  and  C33=e3*Cej  lie  in  the  hatched  region  shown  in  Figure  2. 


Figure  2.  Deformations  achievable  by  mixing  two  variants. 


The  proof  of  this  result  consists  of  two  parts.  The  first  part  makes  use  of  the  weakly 
continuous  functions,  those  functions  f  having  the  property  that 

F(DyOO)  -^f(Dy)  (3.9) 

whenever  yOO  in  Wl>°°(£2,  IR3).  It  is  known  that  the  weakly  continuous  functions  for 
sequences  y<k)— »1R3  ^ 

G,  cofG,  detG  .  (3.10) 

Here,  cofG  denotes  the  matrix  of  cofactors  of  G.  The  weakly  continuous  functions  give 
restrictions  on  a  Young  measure  with  support  on  HI  of  the  form 
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Dy  =  fGdvx(G), 

m 

cofDy  =  ^JcofGdVx(G) , 
detDy  =  JdetGdvx(G)  , 

which  in  turn  lead  to  restrictions  on  the  Young  measure  like 

cof|  |Gdvx(G)  =  JcofGdvx(G) . 

Tfi  m 


(3.11) 


(3.12) 


The  first  part  of  the  proof  exploits  (3.12)  and  the  analogous  restriction  for  det  to  yield  the  result 
summarized  by  Figure  2. 

The  second  half  of  the  proof  consists  of  showing  that  each  point  in  the  domain  of  Figure  2 
is  achievable  by  some  microstructure.  This  follows  from  an  explicit  calculation  of  a  family  of 
sequences.  The  construction  proceeds  by  selecting  suitable  matrices  A,  B,  C  from  °lTi  with  the 
properties 

rank  (A-B)  =  1 , 

rank[XA  +  (l-X)B  -  C]  =  1  ,  (3.13) 

for  some  Xe  (0,  1).  The  conditions  (3.13)  are  sufficient  that  there  be  a  sequence  yfc),  which 
essentially  describes  the  “layers  within  layers”  microstructure  shown  in  Figure  3a  and  satisfies  the 
boundary  conditions  (3.5). 

There  is,  however,  great  nonuniqueness  in  this  calculation,  and,  for  example,  the 
microstructures  shown  in  Figure  3b  also  suffice*.  In  Figure  3  we  have  also  shown  typical 
macroscopic  deformations  associated  with  these  microstructures.  Of  course,  the  angles  between 
the  layer  groups  and  the  relative  volume  fractions  change  with  F. 

4 .  Micromagnetics 

Ferromagnetic  materials  often  exhibit  fine  microstructures  consisting  of  magnetic  domains. 
Furthermore,  it  is  of  interest  in  such  materials  to  have  methods  of  relating  microscopic  to 
macroscopic  properties,  both  in  the  case  of  atomic/  microstructural  and 
microstructural/macroscopic  properties.  Since  Maxwell’s  equations  are  linear,  there  is  no  difficulty 
averaging  solutions  unless  it  is  necessary  to  average  nonlinear  functions  of  the  solutions,  such  as 
the  electromagnetic  energy.  The  div-curl  lemma  and  the  method  of  compensated  compactness 
(e.g.,  Tartar  [18])  show  which  nonlinear  functions  can  be  meaningfully  averaged.  Here,  our 
interest  is  not  in  nonlinear  functions  of  the  fields,  but  rather  in  the  nonlinearity  introduced  by  the 


*  Remark  due  to  D.  Kinderlehrer. 
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1 


b 


Figure  3.  Microstructures  which  suffice  to  achieve  all  possible  linear  boundary  conditions  that 
can  be  achieved  by  mixing  two  variants  compatibly. 

constitutive  properties  of  ferromagnetic  materials.  This  nonlinearity  is  responsible  for  domain 
structure. 

The  results  of  this  section  are  from  recent  work  of  James  and  Kinderlehrer  [10].  These 
results  provide  an  example  of  the  phenomenon  of  frustration  (the  only  rigorous  example  we  know 
of),  this  being  the  phenomenon  whereby  a  material  has  a  defective  ground  state.  Mathematically, 
the  defects  arise  from  the  failure  of  weak  convergence  to  preserve  the  constraint. 

Usually,  the  total  free  energy  of  a  ferromagnetic  material  [14]  is  given  as  a  sum  of 
exchange  energy,  anisotropy  energy  and  magnetostatic  energy: 

E(M]  =  f  clDMl2  dx  +  J  <j>(M)dx  +  \  f  IDu!2  dx  ,  (4.1) 

Q  Q  ^(Rn 

where  M:£2— »a(0)S2,  and  the  magnetostatic  potential  u  and  M  are  related  by 

div  (-Du  +  =  0  on  !R  n  .  (4.2) 
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We  hold  the  temperature  constant  and  therefore  put  cc(0)=l  (so  IMI=1),  without  loss  of  generality. 
It  is  usually  assumed  that  the  anisotropy  energy  0  is  even  and  quadratic  in  the  direction  cosines  of 
M  and  that  <J>  exhibits  crystallographic  symmetry;  we  discuss  the  two  cases: 


1.  Uniaxial  4>(Mo)=0(-Mo)«)>(M)  V  M^Mo  , 

2.  Cubic  <j>(±Mi)<4>(M)  V  M*Mj , 

i,  j=l,  ...  ,  n, 

({Mi}  is  an  orthonormal  basis). 

The  exchange  energy  can  be  thought  of  as  giving  rise  to  a  surface  energy  on  domain 
boundaries.  The  associated  calculation  is  very  similar  to  the  calculation  of  the  surface  energy  on  an 
interface  between  fluid  phases  using  the  van  der  Waals  theory.  Anzellotti,  Baldo  and  Visitin  [1] 
give  a  modem  treatment  of  this  calculation;  the  correct  asymptotic  scaling  can  either  be  obtained 
from  treatments  of  the  van  der  Waals  theory  ( e.g Sternberg  [17])  or  from  the  famous  1935  paper 
of  Landau  and  Lifshitz  [14].  Typical  domain  patterns  in  large  bodies  reveal  a  huge  surface  area  of 
domain  walls  so  we  shall  temporarily  put  c=0. 

Hence,  we  consider  the  problem 


inf  J  <|>(M(x))dx 
MeL~  Q 


+  ~  J  IDul2  dx 
IRn 


!M1=1 


(4.3) 


subject  to 

div(-Du  +  M%q)  =  0  ,  ueH1(IRn,  1R)  .  (4.4) 

Does  (4.3)  have  an  attained  absolute  minimum?  In  the  uniaxial  case,  the  answer  is  no  if  Q. 
is  a  smooth  bounded  domain.  Intuitively,  this  can  be  seen  from  the  following  argument.  To  make 
both  the  magnetostatic  and  anisotropy  energies  small,  we  would  like  to  put 

Du(x)  =  0  a.e.  xs  lRn  , 

M(x)  =  ±Mo  a.e.  xe  £2  .  (4.5) 

Equation  (4.5)  i  shows  that  M  should  be  a  divergence-free  field  in  the  weak  sense.  We  can 
construct  divergence-free  fields  M=±Mo  on  Q  by  constructing  columnar  domains  with  boundaries 
everywhere  parallel  to  Mo,  but  these  will  not  be  divergence-free  on  !Rn  since  the  boundary 
condition 

iMj  •  n  =  0  on  9D  (4.6) 

is  not  satisfied  at  points  where  the  tops  of  the  columns  meet  3Q.  However,  if  we  consider  finer 
and  finer  columns  of  equal  volume  (that  fill  £2),  then  the  average  value  of  M  will  be  nearly  zero  on 
£2  and  therefore  will  approximately  satisfy  (4.6).  Minimizing  sequence  MW  can  be  constructed 
using  exactly  this  idea,  and  for  such  sequences 
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M® — ^0  in  L°°(£2,  lRn) , 

-►  0inL2(IRn,  IR)  . 

The  weak  limit  of  MOO  is  zero  which  does  not  satisfy  the  constraint  IMI=1.  This  minimizing 
sequence  serves  to  show  that  the  value  of  the  infimum  in  (4.3)  is  zero,  so  any  minimizer  would 
have  to  satisfy  (4.4)  and  (4.5);  as  might  be  anticipated  from  (4.6),  the  equations  (4.4)  and  (4.5) 

have  no  solution  us  H1,  Me  L°°. 

The  typical  microstructure  of  uniaxial  ferromagnets  (of  mm  size  or  greater)  consists  of  fine 
columnar  domains  parallel  to  the  easy  axis  (i.e.,  parallel  to  Mo).  A  huge  variety  of  the  cross- 
sectional  shapes  is  observed. 

The  cubic  case  is  quite  different,  as  might  be  anticipated  from  the  textbook  picture  (Figure 
4a)  of  domains  in  an  iron  crystal  whose  boundaries  are  (100)  planes.  Clearly,  this  picture 
embodies  a  minimizer  since  M  minimizes  both  the  anisotropy  energy  and  is  divergence-free  on 
IR3.  At  first,  this  suggests  that  the  minimum  is  attained  only  on  special  domains  £2  but  Figure  4b 
suggests  otherwise.  Figure  4b  shows  a  portion  of  9£2  and  a  divergence- free  unit  vector  field  M  on 
£1  This  field  gets  finer  and  finer  at  dCl  as  indicated  in  the  figure.  Note  that  M  averages  to  zero  as 
9£2  is  approached  from  inside  £2,  and  it  can  be  shown  that  for  such  a  vector  field, 

u(x)  =  0  a.e.  x  e  IR2  . 


a 


b 


Figure  4. 


Minimizing  domain  structures  for  cubic  ferromagnets  with  exchange  energy 
omitted. 
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Hence,  this  vector  field  Me  L°°(Q,  IR2)  represents  an  attained  absolute  minimum.  A  more  refined 
argument  [10]  based  on  Vitali’s  Covering  Theorem  and  the  domain  structure  of  Figure  4a  suffices 
to  prove  attainment  for  any  bounded  open  set  &  with  meas  (3Q)=0. 

Domain  splitting  near  the  boundary  of  cubic  magnets  is  common,  making  the  interpretation 
of  observations  of  domain  patterns  on  9Q  extremely  difficult  However,  the  phenomenon  of  often 
explained  from  a  different  perspective  by  an  analysis  of  Lifshitz  [15]  which  has  origins  in  the 
magnetostrictive  contribution  to  energy.  A  discussion  of  this  point  can  be  found  in  reference  [10]. 
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CONCURRENT  SPECIFICATIONS  AND  THEIR 
GUREVICH-HARRINGTON  GAMES  AND  REPRESENTATION  OF 

PROGRAMS  AS  STRATEGIES 
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Caldwell  Hall 
Cornell  University 
Ithaca,  NY  14853 

ABSTRACT.  We  suggest  a  novel  way  to  view  concurrent  (possibly  perpetually  executing) 
programs.  Non-deterministic  choice  is  allowed.  We  regard  program  execution  as  a  play  of  a  game  of 
two  players,  which  we  call  a  computational  game.  One  player  (Prog,  which  stands  for 
“programmer”)  submits  sets  of  instructions  for  another  player  (Comp,  which  stands  for  “computer”) 
to  execute.  A  program  is  represented  by  a  strategy  of  Prog,  a  program  specification  is  represented  by  a 
winning  condition.  Our  approach  stems  from  the  work  of  Rabin,  Gurevich  and  Harrington  on  S2S, 
and  Buchi  on  game  detenninacy.  We  relate  to  a  programming  language  a  computational  game  and 
give  two  examples  of  die  simplest  programs  viewed  as  strategies  in  such  a  game.  Programming 
language  constructs  (including  concurrent  connection),  correspond  to  operations  over  strategies 
producing  new  strategies.  These  operations  permit  to  easily  relate  to  each  program  a  strategy  that  is 
denoted  by  it.  The  operations  are  defined  informally  here  and  more  accurately  in  the  sequel  to  this 
paper.  Here  we  list  properties  of  such  operations  over  strategies.  The  properties  of  the  operations 
allow  to  do  program  verification  proofs  if  the  program  specification  is  represented  as  a  winning 
condition  for  a  computational  game.  We  illustrate  the  program  verification  by  using  Park’s  example. 
The  concurrent  program  specification  requirements  of  mutual  exclusion  and  absence  of  lockouts  are 
represented  by  Gurevich-Harrington  winning  conditions.  These  requirements  can  be  verified  for  any 
given  program  using  the  above  properties.  The  idea  of  using  techniques  from  the  decidability  results 
belongs  to  Prof.  Anil  Nerode  who  also  was  the  first  to  my  knowledge  to  clearly  state  that  the 
programs  could  be  understood  as  strategies  in  certain  games.  He  also  suggested  many  other  valuable 
ideas. 

The  sequel  to  this  work,  “Extraction  of  Concurrent  Programs  from  Gurevich-Harrington 
Strategies,”  is  written  by  Vladimir  Yakhnis  and  is  included  in  this  collection. 

INTRODUCTION. 

Game  theory  is  a  traditional  branch  of  Mathematics,  Logic  and,  more  recently,  Computer 
Science.  Von  Neumann  and  Morgenstem  in  1930’s  and  1940’s  developed  finite  mathematical  games 
for  several  players.  Games  for  two  players  have  been  put  in  their  present  form  as  "infinite  games  with 
perfect  information”  in  the  work  of  Gale  and  Stewart  (1953).  Such  games  have  been  extensively  used 
in  descriptive  set  theory  [M],  model  theory  and  mathematical  logic.  The  connection  between 
decidability  of  certain  theories,  determ inacy  of  games  and  automata  was  explored  by  Rabin  [R], 
Gurevich  and  Harrington  [GH],  and  Buchi  [B]. 

The  words  “strategy”  and  "program  behavior”  are  often  used  in  the  context  of  computer 
programming.  But  this  terms  also  have  precise  game- theoretic  meaning.  We  systematically  interpret 
such  computer  programming  terms  as  computer  program,  execution  sequence  and  program 
specification,  game-theoretically.  Then  program  development  and  program  verification  become 
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precise  game  theoretical  notions  of  finding  a  winning  strategy  or  proving  that  a  strategy  is  winning. 

It  appears  to  us  that  game  theoretical  meaning  of  a  program  as  a  strategy  is  more  natural  than  the 
meanings  supplied  by  other  approaches.  Interestingly,  this  does  not  mean  that  matters  become  easy 
immediately.  This  is  because  the  relevant  game  theoretical  problems  are  different  than  the  ones 
typically  considered  in  the  theory  of  Gale- Stewart  games.  For  example,  instead  of  finding  out 
whether  the  game  is  determinate  (i.e.  whether  one  of  the  players  wins)  we  not  only  want  to  know  who 
exactly  wins  and  what  is  the  winning  strategy,  but  we  also  like  to  know  whether  it  is  possible  to  find 
a  winning  strategy  in  the  designated  class  of  strategies. 

There  are  some  examples  of  such  theorems  in  game  theory.  They  are  [BL]  1967  and  [B]  1983. 
But  the  winning  strategies,  that  are  described  there,  are  so  complex  that  we  were  unable  to  use  them  in 
order  to  produce  concrete  programs. 

In  1982  Gurevich  and  Harrington  published  a  proof  of  game  determinacy  theorem  for  a  class  of 
games  which  we  shall  call  GH  games.  This  theorem  served  as  a  tool  in  their  celebrated  short  proof  of 
Rabin’s  theorem.  Their  proof  contains  ingenious  descriptions  of  winning  strategies,  but  the  strategies 
are  not  explicitly  given.  Using  their  methods,  we  developed  a  sufficient  conditions  for  a  given  player 
to  win  which  also  gives  a  wide  class  of  explicit  winning  strategies.  Our  purpose  is  to  use  these 
strategies  in  constructing  concurrent  programs 

We  briefly  compare  our  game-theoretical  meaning  of  programs  to  that  of  temporal  logic 
(Manna&  Wolper  [MW],  Lamport  [LI],  Gabbay  et  al  [GA],  Manna  &  Pnueli  [MP2]),  automata 
(Manna  &  Pnueli  [MP1],  Alpem  &  Schneider  [ASD,  Gries-Owidd  (Owicki  &Gries  [OG],  Lamport 
[L2]),  and  deootatiooal semantics  (de  Bakker  &  Zucker  [BZ]). 

We  regard  any  program  specification  given  informally  or  formally  as  a  winning  condition  in  a 
game  that  we  associate  with  a  given  programming  language. 

Unlike  Manna  &  Wolper  [MW],  we  do  not  extract  a  program  from  a  model  of  its  (temporal) 
specification.  Instead  we  rely  on  a  theorem  yielding 

1 .  Sufficient  conditions  for  a  given  player  to  win; 

2.  A  large  class  of  winning  strategies  in  case  the  conditions  hold.  A  program  is  then  designed 
from  a  winning  strategy. 

Following  Buchi  [B],  we  employ  automata  with  output  to  represent  strategies,  while  Manna  & 
Pnueli  [MP1]  and  Alpem  &  Sneider  [AS]  use  accepting  automata  to  define  execution  sequences 
satisfying  the  program  specification.  Automata  are  not  absolutely  essential  for  our  present  approach 
and  could  be  replaced  by  non-deterministic  partial  strategies. 

In  the  temporal  logic,  Gries-Owidd  and  Manna- Pnueli- Alpem-Snrider  approaches  the 
meaning  of  a  program  is  the  set  of  its  execution  sequences.  These  approaches  provide  formalisms  for 
spedfying  properties  of  execution  sequences  and  for  program  verification  on  that  basis. 

In  contrast,  we  think  that  our  notions  of  program  as  denoting  a  strategy  and  an  execution 
sequence  of  a  program  as  representing  a  play  consistent  with  the  strategy  more  naturally  reflect 
programming  practice  and  with  full  mathematical  predsion.  Our  notion  of  play  contains  more 
information  than  that  of  execution  sequence,  for  example  information  about  computer  delaying  the 
execution  of  submitted  instructions.  This  information  permits  us  to  define  interleaving  naturally,  and 
to  spedfy  a  wider  variety  of  properties  for  the  program  execution  process.  The  essential  ingredient  of 
our  approach  is  the  use  of  theorems  for  finding  winning  strategies  in  classes  of  games  that  arise  from 
program  spedfications. 

Denotational  semantics  views  a  program  as  a  function  over  a  mathematical  structure  designed  for 
a  given  programming  language.  Our  approach  shares  this  feature.  A  program  is  represented  by  a 
function  (i.e.  a  strategy)  over  a  game  tree,  which  is  a  mathematical  structure  encapsulating  rules  of  the 
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game  corresponding  to  a  programming  language.  In  the  denotational  semantics  we  have  seen  thus  far 
the  mathematical  structures  which  give  the  meaning  to  concurrency  are  so  complex  that  even 
reasonably  simple  programs  and  proofs  have  cumbersome  denotations.  We  think  that  our  approach 
gives  a  simple  and  very  intuitive  denotation. 

1.  PROGRAMS  AS  STRATEGIES 

Gale-Stuart  games  are  played  by  two  players.  We  consider  in  the  present  paper  the  following 
version  of  their  games  which  we  call  computational  games.  The  plays  of  a  game  can  be  either 
infinitely  long  or  finite.  We  shall  call  the  players  Prog  and  Comp.  They  alternate  in  making 
moves.  Prog  plays  first.  A  move  of  a  player  is  to  choose  and  append  a  letter  ae£  of  a  given 
alphabet  I  to  a  sequence  obtained  from  previous  moves.  The  resulting  sequence  of  moves  is  a  play 
of  a  game: 

3 j j  3^  •••  3  f  ••• 

A  finite  initial  segment  of  a  play  is  called  a  position  of  the  game.  The  set  T  of  all  positions  is 
called  the  game  tree.  Positions  where  Prog  (Comp)  makes  a  move  are  called  Pos(P)  (Pos(C». 
They  are  the  positions  with  even  (odd)  length.  The  game  must  specify  rules  restricting  possible 
moves  of  players  and  a  winning  condition.  We  introduce  the  rules  gradually  in  subsequent  examples. 
A  winning  condition  ora  wimnngset  for  a  given  player  is  a  collection  of  plays  satisfying  all  the 
restrictions  on  moves.  We  say  that  a  player  wins  a  play  if  it  lies  in  the  winning  set  of  the  player.  A 
play  is  finite  if  Prog  made  a  special  move  end.  This  is  the  last  move  of  a  play.  If  a  play  does  not 
include  the  move  end,  it  is  infinite. 

The  usual  intuition  describes  a  program  as  some  orderly  way  of  submission  of  instructions  for  a 
computer  to  execute.  In  our  framework  each  of  the  submissions  constitute  a  move  of  Prog  and  each 
execution  of  an  instruction  is  a  move  of  Comp.  Informally,  a  Prog’s  move  is  a  set  of  instructions  (as 
opposed  to  a  singleton  in  deterministic  systems).  An  empty  set  of  instructions  is  called  skip.  A 
Comp’s  move  indicates  which  instruction  (if  any)  has  just  been  executed.  The  move  corresponding  to 
the  absence  of  executions  is  called  wait.  In  contrast  to  a  Prog’s  move,  only  execution  of  at  most  one 
instruction  is  allowed.  Multiple  executions  are  simulated  by  all  possible  orderings  of  the  respective 
consecutive  moves. 

However  apart  from  the  instructions,  each  program  contains  a  number  of  directives  governing 
the  order  in  which  the  instructions  are  executed.  In  the  case  of  a  sequential  program  these  directives 
are  the  actual  order  of  the  instructions,  go  to’s,  the  structured  statements  like  if...thcn...else, 
while.. .do  and  so  on.  In  the  case  of  a  concurrent  program  they  are  cobegm...coend’s,  par’s,  fork’s, 
implicit  directives  contained  in  semaphores  and  so  on.  In  our  framework  these  directives  govern  the 
behavior  of  Prog  in  the  course  of  a  play  and  therefore  constitute  a  strategy  for  Prog. 

A  strategy  for  a  given  player  is  a  function  from  all  positions  of  the  player,  into  the  set  of 
moves  allowed  for  the  player.  We  say  that  a  player  uses  a  strategy  f  from  some  position  p  on,  if 
at  any  later  position  q  of  a  play,  where  the  player  has  to  make  his  move,  the  player  chooses  a  move 
from  the  set  f(q).  We  also  say  that  a  play,  where  a  player  uses  a  strategy  f  from  some  position  p 
on,  is  consisteat  with  the  strategy  at  p.  We  consider  the  "state-strategies  ”  developed  by  Buchi 
and  the  ” strategies  with  restricted  memory  developed  by  Gurevich  and  Harrington. 

We  shall  show  how  to  find  for  each  program  a  finite  state-strategy  for  Prog  which  represents 
precisely  the  directives  for  the  order  of  computations  prescribed  by  the  program.  We  think  of  this 
finite  state-strategy  as  the  meaning  of  the  program. 

We  relate  to  every  program  building  construct  an  operation  over  state-strategies.  This  is 
illustrated  by  examples. 
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Suppose  that  wc  are  given  a  PASCAL-like  programming  language  L.  Every  program  which 

we  consider  consists  of  assignments,  constructors  if ...  then ...  dse’s,  while ...  do’s,  and  a 

concurrent  constructor.  Concurrent  constructs  have  the  form  cobegin  P  ;...;P  coend,  where 

1  0 

P, J*  are  programming  blocks,  i.e.either  subroutines,  blocks  begin ...  end  or  single 
1  n 

assignments.  In  distinction  to  the  other  approaches,  we  do  not  make  any  assumption  about  the 
appearance  or  nesting  of  the  concurrent  constructs  in  the  programs.  They  may  appear  anywhere  and 
may  be  nested  in  any  possible  way. 

Let  A  be  a  finite  set  of  instructions  from  L  including  x:=l. 

EXAMPLE  1.  Suppose  that  the  program  begin  x:=l  end  is  being  executed.  We  define  the 
following  computational  game.  The  Prog’s  moves  constitute  the  set  iP(A)ufend}.  called  Prog 

alphabet  Z^.  We  shall  often  use  the  notation  skip  for  the  empty  move  of  Prog.  The  Comp  moves 

constitute  the  set  Au(wait).  called  Comp  alphabet  1^.  The  (disjoint)  sum  Z^ul*-  is  called  the 
game  alphabet  Z.  The  following  sequence  is  an  example  of  a  play. 
fx:=l).  wait,  skin,  wait,  skin.  x:=l,  end 
The  following  two  rales  are  restricting  the  moves  of  players. 

(Rule  1)  Each  Comp  move  other  than  wait  has  to  be  a  member  of  some  previous  Prog 
move,  s.t.  Comp  has  not  yet  used  in  his  previous  moves.  We  call  the  set  of  all  such  permissible 
Comp  moves  at  a  position  p  by  Avail(p). 

(Rule  2)  Any  move  of  Prog  at  a  position  p  must  be  disjoint  with  Avail(p).  Le.  Prog  may 
not  include  an  instruction  in  his  move  if  he  has  submitted  this  instruction  previously  and  Comp  has  not 
executed  it  yet. 

For  the  play  above  the  function  Avail(p)  assumes  the  following  values  at  positions  of  the  play 
(beginning  with  the  root). 

0,  {x:=l},  {x:=l},  {x:=l},  {x:=l},  (x:=l},  0,  0 
Prog  and  Comp  moves  may  be  defined  by  means  of  a  special  Moore  automaton  called  Exec, 
which  models  the  states  of  an  operational  system  and  a  computer  memory.  The  automaton  alphabet  is 
Z.  A  state  of  Exec  has  two  components.  The  first  component,  called  a  machine stateis  an 
assignment  of  values  to  all  variables  occurring  in  A.  We  may  assume  for  example  that  the  values  are 

rational  numbers.  The  set  of  all  machine  states  we  designate  S^.  The  second  component  is  a  subset 
of  A.  It  is  intended  to  represent  Avail(p).  So  the  set  of  Exec  states  is  S^*P(A).  The  set  of  initial 
states  is  S^x  { 0 } .  The  transition  table  M  is  described  as  follows.  If  (s,U)€S^x1P(A)  and 
<J€Z*\  then  M((s,U),a)=(s,Uua),  where  erjtj  is  identified  with  0.  If  5€Z^,  then 
M((s,U),5)=(s',U-{6}),  where  (wait!  is  identified  with  0  and  s'  is  a  machine  state  that  results 
from  the  execution  of  the  instruction  6  in  the  machine  state  s. 

There  are  two  output  functions  defined  on  states  of  Exec.  f^(s,U)=F(A-U)u(end)  and 
f^(s.U)=Uufwaitl  which  represent  vacuous  strategies  for  Prog  and  Comp  respectively.  The  value 

of  the  vacuous  strategy  for  a  player  (Vac**  for  Prog  and  Vac^  for  Comp)  at  a  position  p  is  the 
set  of  all  possible  moves  the  player  can  make  according  to  the  game  rules.  If  (s,U)  is  the  last  state 

of  the  Exec’s  run  on  p,  then  Vac^(p)=f^(s,U)  and  Vac^(p)=f^(s,U),  depending  on  whether 

p€Pos(P)  or  p€Pos(C).  In  fact  £  is  somewhat  more  complicated  than  we  just  described.  We 
shall  give  more  precise  explanations  later  (Section  2). 
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If  s  is  a  machine  state,  y  is  a  variable  and  v  is  some  value  then  s[y/v]  is  the  state  which 
assigns  to  all  distinct  from  y  variables  the  same  value  as  does  s  and  which  assigns  v  to  y. 

Let  (s,0)  be  an  initial  Exec’s  state.  Then  the  (s,0)-runof  Exec  over  the  above  play  is 
(s,0),  (s,{x:=l}),  (s,{x:=l}),  (s,{x:=l}),  (s,{x:=l}),  (s,{x:=l}),  («[x/l],0),  (s[x/l],0). 
Call  an  Exec’s  tun  over  a  position  (or  a  play)  an  Exec  run  over  the  position  (or  the  play)  as  a 
word  in  the  game  alphabet. 

Now  we  shall  explain  the  notion  of  anon-deterministic  state-strategy  for  a  player  Q.  Let  Cl 
be  a  Moore  automaton  with  input  over  the  game  alphabet  I,  with  the  set  of  states  S,  set  of  initial 

states  S^,  a  deterministic  transition  table  M  and  the  output  function  F:S-+tP(I^).  If  SQ6S;n  then 
Cl  and  so  induce  the  following  strategy  f.  Let  pePos (£2).  Run  Cl  on  p  from  sq.  If  s  is  the 

last  state  of  the  run,  take  f(p)=F(s).  It  is  easy  to  see  that  Vac**  and  Vac^  above  are  non- 
determ  inis  tic  state-strategies. 

For  a  deterministic  state-strategies  the  above  is  simplified  since  it  is  sufficient  for  Cl  in  this  case 
to  take  as  an  input  only  the  moves  of  the  opponent 

Informally,  the  program  begin  x:=l  end  represents  the  following  Prog’s  behavior  in  the 
computational  game: 

1.  Submit  the  set  of  of  instructions  { x:=l } ; 

2.  Wait  until  the  instruction  is  completed,  by  making  skip  move; 

3.  Finish  the  play  by  submitting  sod  move. 

We  shall  describe  a  deterministic  state-strategy  corresponding  to  begin  x:=l  end  by  giving  its 
state  diagram. 


On  Fig.  I  a  represents  begin  x:=l  end,  ovals  represent  states,  thin  arrows  represent  the  input 
and  thick  arrows  represent  the  output. 

Let  W  be  a  collection  of  plays  and  T  be  the  game  tree.  T =<T,ft,W>  designates  a  game 
where  n  wins  a  play  \i  iff  }J€W  and,  conversely,  1-J2  wins  a  play  m  iff  m^W.  W  is  called 
the  winning  condition  for  Q.  If  the  complement  of  W  is  designated  Wc=Play(T)-W  then  we  can 


also  write  r=<T,l-n,Wc>. 

Now  we  shall  reduce  a  program  specification  for  the  example  to  the  notion  of  a  winning 
condition.  Let  (s,0)  be  an  initial  Exec’s  state.  The  program  specification  “the  program  terminates 
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with  x=l”  is  translated  as  “the  set  of  all  finite  (i.e. terminated  by  end)  plays  p  s.t.  the  last  machine 
state  on  p  resulting  from  the  (s,0)-runof  Exec  on  p  is  s[x/l].”  Let  us  call  this  set  of  plays 

W3,  where  a  stands  for  begin  x;=l  end. 

We  adopt  the  following  convention,  unless  we  say  otherwise.  In  all  computational  games 
Comp  looses  every  infinite  play,  where  he  refuses  to  execute  an  instruction,  submitted  by  Prog.  More 
precisely,  Comp  looses  every  infinite  play,  for  which  there  is  an  instruction  which  is  contained  in  the 
set  Avail(q)  for  all  q  after  some  position  p.  Let  Wr  be  the  set  of  all  such  plays.  Here  r  stands 
for  Comp  refusal  to  complete  some  submitted  instruction. 

Therefore  we  represent  the  program  specification  above  by  the  game  T =<T,Prog,WauWr>. 

Let  f  be  an  ^-strategy.  We  say  that  f  is  perpetual  at  p  if  for  any  nonterminal  position 
qePos(ft)  consistent  from  p  with  f  we  have  f(q)*0.  We  say  that  f  is  cooditiaoally  winning  at 
p  a  game  T,  if  every  play,  containing  p  and  consistent  after  p  with  f  is  in  W.  Finally,  we  say 
that  f  is  warning  r  at  p,  if  f  is  perpetual  at  p  and  f  is  conditionally  winning  T  at  p.  If  p  is 
the  root  e,  then  we  omit  references  to  positions  in  the  above  definitions.  Lachlan  in  [LAC]  1970  also 
used  the  notion  of  a  perpetual  strategy,  though  he  named  it  differently. 

It  is  easy  to  see  that  the  strategy  begin  x:=l  end  wins  T. 

EXAMPLE  2.  We  begin  with  the  program  cobegin  x:=l,  x:=2  coend.  We  assume  that  x:=l 
and  x:=2  are  in  A,  and  so  Prog’s  and  Comp’s  alphabets  are  the  same  as  in  the  example  1. 

We  shall  use  abbreviations  a  for  x:=l,  b  for  x:=2. 

The  program  represents  the  following  Programmer  behavior  in  the  computational  game. 

1.  Submit  the  the  set  of  of  instructions  {x:=l,  x:=2}. 

2.  Wait  until  each  submitted  instruction  is  completed,  by  making  skip  move. 

3.  Finish  the  play  by  submitting  snd  move. 

Below  is  its  state  diagram. 
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v  FIG.  2 

On  the  diagram  above,  big  arrows  correspond  to  the  output  (i.e.  moves  of  Prog)  and  small 
arrows  correspond  to  the  input  (i.e.  moves  of  Comp). 

Now,  let  W*5  be  defined  by  replacing  s(x/l]  with  s[x/2]  in  the  definition  of  Wa  in  example 
1.  Then  the  program  specification  “the  program  terminates  with  x=l  or  x=2”  corresponds  to  the 

game  <T,Prog,WauW^uWr>.  It  is  easy  to  see  that  the  above  strategy  wins  this  game. 

The  consideration  of  the  last  example  is  based  on  the  notion  of  a  mutual  atomicity.  Suppose  that 

A  ,...,A  are  respectively  the  collections  of  assignments  from  the  blocks  P,  ,...,P  from  the 
in  in 

concurrent  construct  above.  We  assume  here,  as  in  other  models  of  concurrency,  that  A  ,...,A 

l  n 

must  be  mutually  atomic.  In  the  simplest  case,  the  assignments  a  and  a  are  called  mutually 

1  it 

atomic  if  the  result  of  a  concurrent  execution  of  a^  and  a^  is  always  the  same  as  that  of  the 

sequential  execution  of  either  a  :  or  a,;  a.  . 

12  2  1 

Assume  that  x=0  and  that  cobegin  x:=x+l;  x:=x+i  coend  is  being  executed.  Suppose  that  there 
are  two  processors  each  of  which  independently  computes  x:=x+l .  Assume  that  any  assignment 
consists  of  two  separate  steps,  i.e.  (1)  computing  of  the  right-hand  part  and  (2)  putting  the  result  into 
the  location  which  corresponds  to  the  left-hand  part.  Then  the  following  scenario  is  feasible.  At 
moment  t  both  processors  compute  x+1  (which  is  equal  to  1),  at  moment  t  the  first  processor 

1  it 

puts  the  result  in  x  and  at  moment  t^  the  second  processor  puts  the  result  in  x .  It  is  easy  to  see  that 
after  all  these  operations  are  finished,  we  have  x=l .  Since  begin  x:=x+l;  x:=x+I  end  gives  us 


x=2  ,  we  must  conclude  that  x:=x+l  and  x:=x+l  are  not  mutually  atomic. 

Our  game-theoretical  approach  allows  us  to  deal  with  programs  without  the  assumption  of 
mutual  atomicity  as  above.  However,  the  model  in  this  case  is  more  complicated  and  we  shall  not 
consider  it  here. 

2.  VERIFICATION  OF  THE  PARK’S  EXAMPLE. 

This  section  is  intended  to  give  an  informal  example  of  a  direct  verification  proof  based  on 
game-theoretic  notions.  This  proof  ultimately  rests  on  the  material  of  the  next  section,  providing 
basis  for  any  proofs  of  this  sort.  But  there  are  other  possibilities  to  do  game-theoretic  verification. 

Since  the  Park’s  example  involves  a  predicate  x=0,  we  extend  a  computational  game  alphabet  to 
account  for  predicates.  We  often  use  the  term  instruction  for  either  a  predicate  or  true  instruction.  Let 
$  be  a  collection  of  predicates  admissible  by  the  programming  language.  The  Prog’s  moves 
constitute  the  set  2^=tP(Au«b)ufend}.  The  notation  skip  is  still  used  for  the  empty  move  of  Prog. 
The  Comp  moves  constitute  the  set  S^=Au(4>x(t.f))u(wait) .  based  on  the  set  {t,f}  of  truth 

values.  The  game  alphabet  is  The  Rules  1  and  2  of  Section  1  are  easily  applicable  in 

the  present  context,  if  the  references  there  to  Comp’s  moves,  which  include  predicates,  are  understood 
as  referring  only  to  the  predicate  component  of  the  move.  The  Exec’s  alphabet  and  transition  table  are 
extended  to  account  for  the  larger  game  alphabet  as  follows.  If  6=<cp,b>  is  a  Comp’s  move,  then 
M((s,U),6)=(s,U-{cp}),  where  cp  holds  in  a  machine  state  s  iff  b=t.  I.e.  a  Comp’s  move, 
containing  a  predicate,  also  contains  its  truth  value  at  an  Exec’s  machine  state  s  at  which  the  move 
has  been  made. 

The  program  of  the  form  while  cp  do  d,  where  d  is  an  instruction,  is  used  in  the  example.  The 
following  state-diagram  describes  the  respective  state-strategy. 
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Note  that  with  each  state-strategy  u  there  is  an  naturally  associated  alphabet  I*1,  which  is  a 
union  of  its  input  and  output  alphabets.  This  permits  us  to  say  that  a  move  occurs  in  u  meaning  that  it 
occurs  in  the  above  alphabet  of  u. 

We  consider  the  program  g=  begin  x:=0;  y:=0;  par(begin  x:=l  end,  while  x=0  do  y:=y+l) 
end.  It  is  required  to  show  that  g  terminates  in  an  Exec’s  state  s  satisfying  (x=l  and  3n€tJ  y=n). 
Call  the  latter  predicate  tp(x,y).  We  shall  restate  this  as  the  winning  condition  of  the  computational 
game.  The  computational  game  alphabet  can  be  restricted  only  to  be  based  on 
A={x:=0,x:=l,y:=y+1}  and  <f>={x=0}.  Let  s(p)  denote  the  Executive  state  arising  after  the  last 

Comp’s  move  of  a  position  p.  Let  p“  denote  a  position  obtained  from  p  by  removing  its  last  move. 
Let  W={p:  m  is  a  terminal  play  and  s(p")  |=  ip(x,y)}.  Let  Wr  be  the  Computer  refusal  set  for  the 
game.  This  is  the  set  of  all  infinite  plays  satisfying  the  condition,  that  for  every  play  from  some 
position  of  it  there  is  an  item  always  occurring  in  a  set  Avail  and  never  occurring  in  all  subsequent 
Computer  moves.  Computer  always  looses  the  set  Wr.  So  we  are  to  show  that  g  wins  the  set 
WuW. 

PROPOSITION  2.1.  The  strategy  g  wins  the  set  WruW. 

Proof.  Note  that  g  is  a  perpetual  strategy,  because  the  strategy  submits  only  instructions  defined 
on  all  Exec’s  machine  states.  It  remains  to  show  that  g  is  conditionally  vanning.  It  is  sufficient  to 

show  that  if  p  is  a  play,  where  Prog  uses  g,  and  is  not  in  Wr  then  p  is  in  W.  The  use  of  g 
involves  first  the  use  of  the  strategy  h=begin  x:=0;  y.=0  end,  it  follows  that  there  is  a  position  p  of 
M  where  h  is  used  last  and  s(p)  |=  (x=0  and  y=0).  We  may  write  m=P‘H>  where  n  is  a  play  that 
begins  at  p. 

(PI)  The  initial  Exec’s  state  sQ(n)  for  a  play  r)  is  s(p).  Hence  it  satisfies 

sQ(n)  1=  (x=0  and  y=0). 

We  shall  show  that  a  play  n  has  a  position  q  satisfying 

(P2)  all  positions  r  of  n  following  (and  including)  q  satisfy  s(r)  |=  (x=l)  and 

(P3)  all  positions  r  of  n  (strictly)  preceding  q  satisfy  s(r)  |=  (x=0). 

These  will  be  used  to  show  that  n  is  a  finite  play.  .It  then  immediately  follows  from  (P2)  that 
the  terminating  Exec’s  state  satisfies  the  first  conjunct  of  ip(x,y).  Then  we  shall  show  that  this  Exec’s 
state  satisfy  the  remaining  conjunct  also. 

Since  immediately  after  the  use  of  h  has  been  completed,  g  submits  the  instruction  x:=l  and 

M^Wr,  it  follows  that  Comp  makes  a  move  5=(x:=l)  in  M  alter  p,  i.e.  5  occurs  in  n-  It 
immediately  follows  that  (P2)  and  (P3)  hold  at  a  position  q  of  n  arising  after  this  move,  because 
there  are  rto  other  moves  involved  in  g,  that  can  affect  the  variable  x. 

Observe  that  Prog  uses  the  strategy  par(  ,)  along  r|-  Denote  u=begin  x:=l  end  and  v=while 
x=0  do  y:sy+l.  It  can  be  noticed  that  all  Prog’s  moves  in  n  occur  either  in  u  or  in  v.  If 
Comp’s  moves  from  u  are  replaced  by  wait  and  Prog’s  moves  from  u  are  deleted  from  n»  the 
resulting  play  is  such  that  Prog  uses  u  in  it.  It  follows  from  (PI)  that  this  play  is  terminating. 
Hence  n  is  terminating.  So  is  (J. 

It  remains  to  show  that  s(n”)  1=  (y€U)).  We’ll  show  by  induction  that  this  holds  for  all  prefixes 

of  n”-  The  induction  base  holds  due  to  (PI).  Every  Comp’s  move  in  r|  either  does  not  effect  the 
variable  y  or  is  an  instruction  y:=y+l,  which  preserves  the  desired  property.  This  completes  the 
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induction  and  the  proof. 


3.  PROPERTIES  OF  OPERATIONS  OVER  STATE-STRATEGIES. 


The  operations  over  state- strategies  corresponding  to  program  connecting  constructs:  sequential 
connection  of  programs,  concurrent  connection  of  programs,  conditional  connection  programs  and 
conditional  repetition  of  a  program  are  defined  in  the  sequel  to  this  paper.  Here  we  state  the  properties 
of  the  operations  over  state-strategies.  The  operations  over  strategies  are  named  similar  to  respective 
program  connecting  constructs. 

We  need  less  restrictive  notion  of  a  game  than  a  computational  game,  io  state  the  properties  of 
concurrent  connection  of  strategies.  We  call  such  a  game  a  bee  computational  game.  It  differs  from 
computational  game  in  omitting  all  references  to  the  automaton  Exec  in  the  game  rales.  Note  that  Rule 
1  and  2  are  still  valid  for  a  free  computational  game. 

We  first  define  a  split  of  any  position  of  a  free  game  in  respect  to  (Prog’s)  strategies  g  and  h 
and  an  arbitrary  position  q  of  the  game.  Let  £8  and  Z*1  be  the  alphabets  associated  with  the 
strategies  as  explained  in  Section  2.  We  assume  these  alphabets  to  be  disjoint.  A  split  of  a  letter  6 
from  the  Computer  alphabet  Z^  are  two  unique  letters:  Sp8(5)=(if  5eZ8  then  5  else  wait)  and 
Spk(6)=(if  5eZ^  then  6  else  wait).  A  split  of  a  letter  <7  from  the  Programmer  alphabet  Z^  are 
two  unique  letters:  Sp8(a)=anZ8  and  Sp^(a)=anZ^.  If  r  is  any  word  let  Sp8(r)  be  a  word 
obtained  from  r  by  replacing  any  its  letter  by  Sp8  of  it.  If  r=q*r’  call  Sp8fr)=q-Sp8(r’)  and 
Spk(r)=q*Spk(r’). 

For  any  play  \x  and  its  position  r  call  pr  a  play  arising  by  deleting  the  prefix  r  from  \ J.  For 
any  natural  number  n  let  pn  be  a  play  obtained  by  deleting  a  prefix  of  length  n  from  |J. 

PROPOSITION  3.1 

A.  If  a  play  n  of  a  free  computational  game  is  consistent  with  concurrent  connection  of  two 
strategies  g  and  h  then  there  are  two  plays  \1  and  \i"  consistent  with  g  and  h  respectively  and 
in  the  respective  free  games  and  such  that 

1 .  a  play  \i  is  terminal  if  and  only  if  \i'  and  \i"  are  terminal. 

2.  m'  and  m"  are  both  not  terminal  for  every  new  ii'(n)=Sp8(|j(n)), 

^"(nJsSpkMn)). 


3.  (J  is  terminal,  \x"  is  not  ^  there  is  a  prefix  r  of  p  s.t.  (|j')~=Sp8(r), 
M"iength(r)=lJr  forevery  n<len8th(r)  M"(n)=Sph(p(n)). 

4.  \i "  is  terminal,  \x  is  not  there  is  a  prefix  r  of  (J  s.t.  (jj'')~ =Sp®( r), 
P'length(r)=Mr  forevery  n<length(r)  M’(n)=Sp®(M(n)). 

Below  |j'  and  \i"  are  both  terminal.  Then  one  of  the  following  three  possibilities  holds. 

5.  (M'r=SpS(M-),  (M"r=Sph(M-). 

6.  (n')~=Sp8(r),  (M  )—.  ./  ,=(M~)  for  some  proper  prefix  r  of  \T  and  for  every 

length!  r)  r 

n<length(r)  (j'XnJsSptypfn)). 
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7.  (M")“=Spk{r-),  (p')“t  l/  X=(M~)  for  some  proper  prefix  r  of  and  for  every 

length!  r)  r 

n<length(r)  M’(n)=SpS(p(n)). 

B.  For  every  state-strategies  g,  h,  u  (gj|h)(|u  is  equivalent  to  gil(hilu)  in  a  free  computational 
game  ■ 

Sequential  connection  of  two  strategies.  The  sequential  use  of  two  strategies  h  and  and  g, 
written  h;  g,  is  defined  when  for  any  final  state  of  h  if  any  state  that  can  shift  into  a  final  state  it  can 
shift  only  into  final  states.  The  strategy  h;  g  is  a  strategy  whose  use  consists  in  the  use  of  the  first 
strategy  until  this  strategy  reaches  (if  ever)  its  final  state  in  a  course  of  a  play,  at  this  position  of  a  play 
the  final  state  is  forgotten  and  Programmer  uses  the  second  strategy  from  its  initial  state. 


PROPOSITION  3.2 

A.  If  a  play  |J  of  a  free  computational  game  is  consistent  with  a  sequential  connection  of 
strategies  h;  g  at  a  position  p  then  either 
1.  \x  is  infinite  and  is  consistent  with  h  from  p  or 


2.  |j=r|~‘£  for  some  finite  play  n  consistent  with  h  from  p  and  some  play  £  consistent 
with  g  from  the  root.  Also  the  initial  Executive  state  for  a  play  £  has  to  coincide  with  an 

element  of  s(n“). 

B.  For  any  state-strategies  g,h,u  (g*h)*u  is  equivalent  to  g*(h*u).  ■ 

The  conditional  connection  of  two  strategies.  The  strategy  (p?  evaluates  the  predicate  cp  and 
memorizes  the  obtained  truth  values  by  terminating  in  two  distinct  states. 

It  is  convenient  to  consider  two  related  strategies.  The  first  strategy  is  denoted  cp1  ?  It  differs 
from  the  strategy  cp?  by  being  undefined  whenever  a  Comp’s  move  is  not  <(p,t>  or  wait.  The 

second  strategy  is  denoted  cp*  ?  It  differs  from  the  strategy  cp?  by  being  undefined  whenever  a 
Comp’s  move  is  not  <cp,f>  or  wait. 

Conditional  use  of  strategies  g,  h  depending  on  the  predicate  cp  beginning  from  a  given  position 
p  consists  of  the  following.  Use  the  strategy  cp?  from  p.  If  the  final  state,  reached  by  cp?  in  the 
course  of  its  use,  corresponds  to  true  outcome,  use  g  from  this  position  on,  if  the  final  state 
corresponds  to  the  outcome  false  use  h.  We  denote  this  strategy  by  if  cp  then  g  else  h. 

The  following  property  characterizes  the  conditional  use  of  state-strategies. 

PROPOSITION  3.3  A  play  p  of  a  free  computational  game  is  consistent  with  the  state-strategy 
if  cp  then  g  else  h  from  a  position  q  iff 


>  where  n  is  terminal  and  is  consistent  with  cp?  from  q,  x  is  a  Comp’s  move  and 
x=<cp,t>  or  <cp,f>  and  either 

a.  £  is  consistent  with  g  at  root  and  x=<cp,t>  or 

b.  |  is  consistent  with  hat  root  and  x=<cp,f>.  ■ 

Use  of  a  strategy  while  certain  condition  holds.  The  strategy  which  uses  a  given  strategy  g  while 

a  predicate  cp  holds  is  denoted  by  while  cp  do  g.  Its  use  in  a  play  consists  in  a  use  of  cp?  and 
termination,  if  the  final  state  of  cp?  is  reached,  corresponding  Comp’s  evaluation  of  the  predicate  to 
false.  Otherwise,  g  is  used  sequentially  after  cp?.  If  g  reaches  one  of  its  final  states,  cp?  is  used 
again  and  the  preceding  part  of  the  description  applies. 

The  characteristic  property  of  a  repetitive  use  of  a  strategy  follows. 

PROPOSITION  3.4  A  play  p  of  a  free  computational  game  is  consistent  with  the  strategy  while 
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cp  do  g  from  q  iff 

1.  there  is  nal  and  plays  such  that  “•£  and  for  i<n  each 

1  2  n  12  n-l  n 

p  is  consistent  with  ?•«  and  is  terminal,  and  £  is  either  consistent  with  (p^?  or  with 
t  n 

tp*?*g  and  in  the  latter  case  £  is  infinite  or 

n 

2.  there  is  infinite  sequence  of  finite  plays  S,  ...  such  that  each  &.=6“*k“  for  some 

1  2  n  i 

finite  play  6  consistent  with  <p*  7  from  the  root  and  some  finite  play  k  consistent  with  g 

from  the  root  (both  6  and  K  depend  on  5.)  and  M=S. ■ 

i  1  2  n 

4.  GUREVICH-HARRINGTON  GAMES  AND  THE  MUTUAL  EXCLUSION  PROBLEM 

Gurevich  and  Harrington  (GH)  considered  winning  conditions  in  the  form  of  a  Boolean 
combination  of  the  sets  [C|] . [Cn],  where  Cj  is  a  subset  of  the  game  tree  and  [Cj]  is  the  set  of 

plays  with  infinitely  many  intersections  with  Cj.  They  proved  that  one  of  the  players  has  a  winning 

strategy  with  restricted  memory.  In  many  cases  their  strategies  with  restricted  memory  are  non- 
deterministic  finite  state-strategies  and  as  such  they  could  be  simulated  by  (concurrent)  programs. 

However,  since  GH  determinacy  result  neither  include  a  criterion  for  a  winning  player  nor 
explicit  description  of  the  winning  strategy,  it  is  not  by  itself  sufficient  for  our  purpose  to  find  a 
concurrent  program  corresponding  to  a  specification.  By  analysing  their  proof  we  have  found  a 
sufficient  condition  for  a  given  player  to  win  and  an  explicit  description  of  a  winning  strategy.  This 
would  be  given  in  the  second  part  of  the  talk.  We  have  found  that  this  condition  encompass  the 
specification  of  the  Mutual  Exclusion  Problem  if  stated  in  game-theoretic  terms.  We  shall  now 
convert  the  specification  for  a  mutual  exclusion  problem  into  a  Gurevich- Harrington  winning 
condition. 

Suppose  that  we  are  given  n  parallel  processes  of  the  form 

repeat 

crit.; 

i 

rem ; 

i 

until  false 

and  assume  that  each  critical  section  crit  requires  a  use  of  a  resource  t  while  the  remainder  rem 
does  not  have  such  a  requirement.  Further  assume  that  we  have  only  ksn  units  of  the  resource  t. 

The  classical  mutual  exclusion  problem  is  to  modify  the  processes  so  as  to  insure  the  following. 

1 .  No  more  then  k  processes  can  be  in  their  critical  sections  at  the  same  time.  (’The  absence  of 
clashes”  requirement.) 

2.  No  process  can  wait  for  indefinitely  long  in  order  to  be  allowed  to  enter  its  critical  section. 
("The  absence  of  lockouts  or  deadlocks”  requirement.) 

3.  There  is  m€OJ  such  that  if  a  process  is  waiting  for  the  permission  to  enter  its  critical  section 
then  during  this  period  of  time  no  more  then  m  other  processes  are  allowed  to  enter  their 
respective  critical  zones.  ("The  bound”  requirement.) 

4.  If  some  of  the  processes  would  stay  in  the  respective  remainder  sections  for  indefinitely 
long,  this  should  not  effect  the  execution  of  the  other  processes.  (The  requirement  on 
tolerance  to  failure.) 

In  order  for  the  solution  to  be  possible  ,  the  following  precondition  is  required. 

5.  k  processes  must  not  occupy  the  respective  critical  zones  indefinitely  long.  (An  assumption 
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of  absence  of  failure  while  at  least  k  processes  are  in  their  critical  sections.) 

In  order  to  represent  the  above  requirements  in  the  form  of  a  winning  condition  we  first  assume 
that  in  addition  to  assignments  the  set  A  (see  section  2)  also  includes  elements  {crit.,  rem.: 

i=0,...,n-l}.  Let  the  alphabet  I  be  as  above.  For  the  convenience,  we  shall  not  distinguish 
between  subsets  a£A  and  their  characteristic  functions.  So,  for  example,  for  a£A  we  shall  write 
a(critj)=l  if  Crimea  and  a(crit|)=0  if  critj^a. 

For  the  simplicity  we  shall  formalize  the  first  two  most  important  requirements  omitting  the  rest. 
Then  there  is  the  following  correspondence  between  the  subsets  of  the  game  tree  and  these 
requirements. 

1.  MutExcl=fp:  for  each  prefix  q  of  p  s.t.  2.  Avail(pXcrit:)sk}  corresponds  to  the 

i€n 

positions  never  violating  the  first  requirement; 

2.  NoLockpfp:  if  p  contains  a  Comp’s  move  5  s.t.  5=remj  then  p  contains  a  Prog’s 

move  a  after  6  s.t.  a(critj)=l  and  if  p  contains  a  Comp’s  move  6  s.t.  5=critj  then  p 
contains  a  Prog’s  move  cr  after  6  s.t.  a(remj)=l}  corresponds  to  the  second 
requirement; 

The  winning  set  corresponding  to  the  requirements  1,  2  and  disregarding  all  others  is  the 

following  Gurevich-Harrington  set.  W=ffMutExc1n(n  [NoLock:!)). 

ten 
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EXTRACTION  OF  CONCURRENT  PROGRAMS  FROM 
GUREVICH-HARRINGTON  GAMES 

Vladimir  Yakhnis 
Mathematical  Sciences  Institute 
Caldwell  Hall 
Cornell  University 
Ithaca,  NY  14853 

ABSTRACT.  To  a  programming  language  we  assign  a  class  of  two  player  games  called 
computational  games.  In  each  game  the  first  player,  called  Prog,  gives  sets  of  instructions  for  another 
player,  called  Comp,  who  executes  them.  Programs  are  strategies  of  Prog,  and  program 
specifications  are  winning  conditions.  We  construct  an  algebra  of  strategies  which  constitute  all 
meanings  of  the  programs  (including  concurrent  programs)  in  the  language.  As  winning  conditions 
we  consider  the  ones  used  by  Gurevich  and  Harrington  (GH)  in  their  celebrated  short  proof  of  Rabin's 
Theorem.  We  give  a  new  Theorem  providing  a  sufficient  condition  for  a  given  player  to  win  a  GH 
game  and  a  wide  class  of  explicit  winning  strategies.  To  create  the  class  of  winning  strategies,  we  are 
using  a  new  notion  of  Priority  Automata,  generalizing  GH's  notion  of  Latest  Appearance  Record 
(LAR).  This  sufficient  condition  is  applicable  to  Mutual  Exclusion  Problem  formulated  as  a  GH 
game.  So,  using  the  Theorem,  we  can  find  a  class  of  winning  strategies  for  Prog  in  a  corresponding 
computational  game.  Using  the  above  algebra  of  strategies  it  is  then  possible  to  find  such  strategy 
from  the  class  of  winning  strategies  which  corresponds  to  a  concurrent  program.  The  idea  of  using 
techniques  from  the  decidability  results  belongs  to  Prof.  Anil  Nerode  who  also  was  the  first  to  my 
knowledge  to  clearly  state  that  the  programs  could  be  understood  as  strategies  in  certain  games.  He 
also  suggested  many  other  valuable  ideas. 

This  work  is  a  sequel  to  “Concurrent  specifications  and  their  Gurevich-Harrington  games  and 
representation  of  programs  as  strategies,”  written  by  Alexander  Yakhnis,  which  is  included  in  this 
collection.  We  assume  familiarity  with  this  paper  which  we  call  Part  1. 

1.  NON-DETERMINISnC  STATE-STRATEGIES 

We  shall  show  how  to  associate  with  every  program  from  the  language,  introduced  in  Part  1  of 
the  sequel,  a  strategy  for  the  first  player  in  the  game  environment  defined  above.  Though  it  is 
possible  to  consider  on  non-detenninistic  strategies  (in  the  usual  sense),  it  appeared  to  be 
convenient  to  deal  with  state-strategies  similar  to  those  used  by  Buchi.  We  modified  the  Buchi’s 
notion  of  a  state-strategy  by  making  it  non-deterministic. 
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A  run  of  a  state-strategy 


strategic  states 
an  initial  state 

a  move  of  Prog 
a  move  of  Comp 


transition  table 


output  function 


FIG.  1 

A  aaa— deterministic  state-strategy  for  Prog  in  our  game  environment  is  the  following  finite 
state  Moor  automaton  F=<S,M>SjI1,P,f>l  where 

1.  Sis  a  set  of  strategic  states; 

2.  M:Sx£-»1P(S)  is  the  transition  tabled  where  M(s,a)nP*0  <=>  M(s,a)£P; 

3.  S.  *0  is  the  set  of  initial  states; 

m 

4.  P  is  the  (possibly  empty)  set  of  final  states; 

5.  A  function  f:S-*I°  is  the  strategic  function,  where  for  every  state  seS  (seP<=J>f(s)=£nd). 
Intuitively,  a  state-strategy  F  for  Prog  works  as  follows.  The  automaton  F  moves  along  the 

play  changing  states  accepting  only  moves  of  Comp ,  thereby  changing  its  states  according  to  the 
transition  table  and  producing  a  run.  With  every  state  s  we  connect  the  output  f(s).  Whenever  a 
position  p  of  Prog  with  the  resultant  state  s  of  the  run  is  reached,  Prog  uses  f(s)  as  its  current 
move.  This  is  illustrated  on  Fig.  1.  Since,  however,  a  state-strategy  may  be  non-deterministic  and 
therefore  could  have  more  than  one  run  on  p ,  we  make  an  agreement  that  if  we  are  using  F  in  a  play 
then  shall  use  a  unique  run  of  F  on  this  play.  This  is  clarified  by  the  definitions  of  consistency  and 
perpetuality. 

Leta„...a  €l*  and  seS:n.  We  say  that  a^...a  is  s- consistent  with  F  if  there  an  s-run 
On  111  0  n 

.  .  ....  on  Og...an  j  s.t.  Sq»s,  f(Sg)=ag  “d  for  aU  i€[l,n-ll  if  a^...a.€Pos(Prog)  then 
f(s.+1)=a.+j .  In  this  case  we  call  sqs^--s0  a  strategic  run  of  F  on  aQ—aQ  j  ■  It  is  easy  to  see 

that  the  empty  string  X  is  s-consistent  with  F  and  that  any  initial  state  constitute  a  strategic  run  on 

X.  We  say  that  O'  ...a  is  consistent  with  F  if  a„...a  is  s-consistent  with  F  for  some 
On  On 

seSjjj .  Similarly  we  define  s-consistency  and  consistency  on  infinite  strings.  Note  that  if  oc  is  is  a 
finite  string  and  there  is  a  strategic  run  of  F  on  a  then  ot  is  consistent  with  F  ,  but  that  the 

^  We  sometimes  shall  write  M(s,a)-»s'  instead  of  s'eM(s,a) . 
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converse  may  not  be  true.  For  infinite  strings  the  consistency  and  the  existence  of  a  strategic  run  are 
equivalent  notions. 

We  say  that  the  state-strategy  F  is  perpetual  if  for  every  position  p  and  every  strategic  run  r 
of  F  on  p  (if  any)  the  following  is  true.  If  p€Pos(Comp)  then  for  any  legal  move  o  of  Comp 
from  the  position  p,  the  run  r  can  be  continued  on  p*o  .  If  p€Pos(Prog)  and  s  is  the  resultant 
state  of  r  then  f(s)  is  defined,  f(s)  is  a  legal  move  of  Prog  from  the  position  p  and  if  f(s)*end  then 
the  run  r  can  be  continued  on  p*f(s). 

Suppose  we  have  a  game  T =<2,T,Prog,W>  which  is  a  part  of  the  game  environment.  We  say 
that  the  state-strategy  F  is  conditionally  winning  over  T  if  for  every  consistent  with  F  play  [i  we 
have  lj€W  .  We  say  that  the  state-strategy  F  is  winning  over  T  if  F  is  perpetual  and  is 
conditionally  winning  over  T  . 

Let  F=<S^M,Sjn,P,f>,  F'=<S',M',S'ilj,P')f'>  be  state-strategies  and  suppose  that  all  the  states 

from  S  and  S'  are  reachable.  An  injection  cp:S-+S'  is  called  a  homomorphism  from  F  into  F  if 
there  is  a  relabelling  g  s.t.  gaf=f  °tp  (in  the  sense  that  g°f  is  defined  iff  f°<p  is  defined), 
qKSj^CS'jjj,  (p(P)CP'  and  for  all  s,t€S  and  a€l  (M(s,c)-»t  ^  M'(cp(s),g(a))-*(p(t))  a  (a  is  a 
move  of  Comp  and  M'((p(s),g(a))*0  £  M(s,cr)*0)  a  (M'(cp(s),g(f(s)))*0  ^  M(s,f(s))*0).  If 
there  is  a  homomorphism  from  F  into  F'  we  say  that  F  is  a  reSoement  of  F.  Since  the  set  of 
strategic  states  is  finite,  it  is  easy  to  see  that  (p:F-»F  is  an  isomorphism  if  and  only  if  cp:S-*S'  is  a 
bijection. 

PROPOSITION  1.  Let  F  be  a  refinement  of  F.  Then  if  F'  is  perpetual  (conditionally 
winning)  then  so  is  F  . 

■ 

2.  AN  ALGEBRA  OF  STATE-STRATEGIES 

Now  we  can  build  a  calculus  of  strategies.  In  the  definitions  below  F  s<S  Ji  X  ,P.,fA> 

o  u  o  urn  u  o 

andF  =<S,,M,,S,.  JP. »f. >  are  arbitrary  non-deterministic  state  strategies  for  Prog  and 
1  1  1  In  1  1 

F=<S,M,S.  ,P4>  are  the  ones  being  defined,  unless  specified  otherwise.  We  assume  that  S  is 
in  u 

disjoint  with  (which  could  be  achieved  by  renaming  of  the  states)  and  that  s^  and  s^  are 
respectively  elements  of  and  S  ^ ,  unless  specified  otherwise.  We  also  assume  that  Iq(S^)  is 
disjoint  with  f^(S^),  which  could  be  achieved  by  relabelling  of  the  instructions. 

Atomic  Strategies. 

Let  The  automaton  [a]  consists  of  the  following  components. 

1.  Ss{ (output  a),  (wait  for  a),  (a  is  finished)}; 

2.  M((output  a).  wait)=(wait  for  a), 

M((output  a),  a)=(a  is  finished), 

M((wait  for  a),  wait)=(wait  for  a), 

M((wait  for  a),  a)=(a  is  finished), 


# 
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3.  S  s{ (output  a)}; 

in 

4.  P={ finish}; 

5.  f((output  a))={a},  f(wait  for  a)=skio.  f(a  is  fmished>=end . 


Composition  F^'F^. 

1  S4S0-P0)uSl ; 

2.  U  PonMo(so,o)=0  then  M(»0,o)»M0(j0,a), 
if  PQnMo(so,a)^0  then  M(s0>a)sSlin, 

3.  S.  =S  ; 

in  Oin 

4.  P=Pi; 

5.  f(s0)=f0(s0)  and  ffs^sf^)  . 

The  following  propositions  shows  that  the  result  of  composition  behaves  as  a  sequential 
application  of  strategies. 

PROPOSITION  2.  Suppose  F^'F^F  and  veV.  Then  a  position  p  associated  with  a  machine 
state  v  is  consistent  with  F  iff  p  is  non-terminal  and  is  consistent  with  F^  or  there  are  positions 
and  q^  associated  respectively  with  machine  states  vQ  and  v^  s.t.  v=vQ,  v^  is  the  resultant 
state  of  the  v^-run  of  Exec  on  q^,  q^-end  is  consistent  with  F^  ,  q^  is  consistent  with  F^  and 

q=vqt • 

■ 

PROPOSITION  3.  The  composition  is  associative  up  to  an  isomorphism. 


Concamnt  connection  F^HF^. 


1  SsS^; 

2.  Suppose  that  a  is  a  non-empty  move  of  Comp.  Then  if  M^(s^,a)-»t  then 
M(<Sg,Sj>,a)-Kt,Sj>  and  if  M^(Sj,a)-»t  then 
M«s  ,s  >,a)-*<s  ,t>  . 

If  5^  and  are  not  final,  then 

M(<s0^  >  ,aaii)=M0(s0^aii)xM  1(s1  ,aaii) 

else  if  Sq  is  not  final,  then 
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M(<Sq,Sj  >  ,2aii)=M0(s0^aii)x  { s  l } 
else  if  s^  is  not  final,  then 
M(<s  4  > ,wail)s {sQ}  *M  1(s1  aai|); 

3.  S  =S  xS  ; 

in  Oin  lin 

4.  p=p0xp1 ; 

5.  If  at  least  one  of  s^  and  is  not  final  then  f^s^s^MgfSgJuf^Sj)  where  at  most 

one  of  the  functions  is  undefined.  In  this  case  we  treat  the  undefined  function  (if  any)  as 
if  it  gives  skip .  Also  in  this  case  we  treat  end  as  skin .  If  s^  and  s^  are  both  final 

then  f(<s^.s^»=end . 

The  following  definitions  and  propositions  shows  the  intuitive  behavior  of  the  concurrent 
connection. 

Suppose  0,  0q  and  0^  are  strings.  We  say  that  p  is  an  interleaving  of  and  if  there 

are  order  preserving  maps  h.:q,-*p  for  i€  (0,1 }  s.t.  h^(skip)=skip  ,  h^(wait)=wait ,  psh^q^uh^qj) 

and  if  q  is  an  occurrence  of  a  non-empty  character  in  q.  forie{0,l}  thenh  (q  )*h  (q  ). 
i  i  u  u  1  1 

PROPOSITION  4.  Suppose  p=g-a-waif...-CT  .-a  ,-wajfg  -a  where  for  all 

0  0  n— 2  n— 2  n— l  n— l 

ien  aelPand  q  is  a  string  in  2^- (wait)  is  a  non-terminal  position  consistent  with  F  ||F . 
i  i  oi 

Then  there  are  non-executable  positions  cu=q'  ‘a'  *3ait*...*q'  -q'  -wait* a'  -a'  and 

0  0  0  n-2  n— 2  n-l  n-i 

q  =q"  -a"  •waif...-a"  .-a"  ,*<*  ,  s.t.  n=max(k,m),  for  all  i€n  O  =q'  no" 

M1  0  0  ~  n-2  n-2  n-l  n-l  i  i  t 

and  q  is  an  interleaving  of  oC .  and  q".,  where  if  i>k— 1  (or  i>m-l)  then  q'.  (or  q".)is 
i  it  it 

identified  with  0  and  q'Q  (or  q"^)  is  identified  with  X  ,  qo  is  consistent  with  Fq  and  qj  is 

consistent  with  Fj.  Moreover,  if  p*cnd  is  consistent  with  FqIIFj  then  qq'cnd  is  consistent  with 
Fq  and  Q^cnd  is  consistent  with  F|. 

■ 

PROPOSITION  5.  The  concurrent  connection  is  commutative  and  associative  up  to  an 
isomorphism  of  state-strategies. 


CootStiooal  if  exp  then  Fq  else  Fj 

Let  e  be  a  Boolean  expression. 

1  SsSqUSjUI (output  e),  (wait  for  e)}  ; 

2.  If  se {(output  e),  (wait  for  e)}  then 
M(s,  wait )=( wait  for  e), 

M(s,  (e,  tnie))=S0in, 

M(s,  (e,  falae»=Slin  , 
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MCSj.aJsM^.CT); 

3.  S.  ={ (output  a)}; 
in 

4  PsP  uP  • 

or 

5.  ((output  e)={e},  f(wait  for  e)=slUfi ,  ^s0^s^S0^  311(1 

The  following  proposition  shows  that  the  conditional  behaves  as  intuitively  expected. 
PROPOSITION  6.  Suppose  if  e  then  FQ  else  F^F,  veV  and  ev  (i.e.  evaluation  of  e  in  v)  is 

defined.  Then  a  position  p  associated  with  a  machine  state  v  is  consistent  with  F  iff  p  is  not 
terminal  and  it  is  consistent  with  [e]  or  the  following  is  true.  If  ev=true  then  there  are  positions  p 
and  <|q  associated  with  machine  state  v  s.t.  p  *end  is  consistent  with  [e]  and  q^  is  consistent  with 

Fq.  If  ev=false  then  there  are  positions  p'  and  associated  with  machine  state  v  s.t.  p'jcnd  is 
consistent  with  [e]  and  is  consistent  with  F^. 


Loop  while  c  do  Fq 

1  S=SqU {(output  e),  (wait  for  e),  finish}  ; 

2.  If  s€{(output  e),  (wait  for  e)}uPg  then 

M(s,  wait)=(wait  for  e), 

M(s,  (e,  tiuc))=S0in, 

M(s,  (e,  false>)=finish 
If  sq€Sq— Pq  then 
M(sg,a  )=MQ(sg,a ); 

3.  S.  ={ (output  a)}; 

in 

4.  P={finish}; 

5.  For  s€ { (output  e)}uPg  f(s)=e,  f(wait  for  e)=skip.  for  sgeSg-Pg  ffsgJ-fo^)  ^ 

f(finish)send. 

The  following  proposition  shows  that  the  loop  behaves  as  intuitively  expected. 
PROPOSITION?.  Suppose  while  e  do  F^=F  and  v^  is  a  machine  state  Then  a  position  p 

with  a  v  -run  of  r  of  Exec  is  consistent  with  F  iff  there  are  positions  pA ,  q.,...,  p  , ,  q  .  , 
0  u  u  n— l  n-i 

where  p  is  not  empty  and  q  may  be  empty,  and  the  states  v  , ...,  v  s.t.  p. ,  q.  are 
n— 1  n-1  U  n-i  t  t 

associated  with  v.  which  is  the  resultant  state  of  r  on  p^,  q^,...,  p.  and  p^,  p. ,  q. ,  also 

p.  is  consistent  with  [el  and  q.  is  consistent  with  F„  forien  and  P=P,.'qA'— 'P  *q  . 

i  i  Q  o  v  n— l  n—  i 
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Moreover,  p  is  terminal  iff  p  is  terminal  and  q  is  empty. 

n— 1  n— 1 

■ 

3.  HOW  TO  LOOK  FOR  WINNING  STRATEGIES 
Priorities  and  the  Strategies  Based  on  Them. 

Let  Z  be  a  finite  alphabet  and  T£Z*.  The  sets  Cj . Cn£T  are  called  basic  GH  sets.  We 

define  a  function  which  extracts  from  a  position  the  history  of  meeting  the  basic  GH  sets.  Let 
Z={0,1}°.  Define  the  coding  function  Code:T-»Z  by  Code{p)pl  if  peCj  and  Code(p)j=0 

otherwise.  This  map  is  converted  into  Code':T->Z*  by  putting  Code'(e)=Code(e)  and 
Code'(p*a)sCode'(p)*Code(p*a). 

We’ll  explain  one  of  the  ways  we  combine  several  strategies  into  one.  Let  Cl=<S,So,M>  be  a 
finite  non-deterministic  automaton  over  the  alphabet  Z.  A  ran  of  Cl  on  Z*  is  a  function  r:Z*-»S 
s.t.  r<e)€So  and  for  all  ot€Z*  and  a  el,  r(oc*a)€M(r(oO,a).  We  combine  a  given  finite 

collection  {fs  :  seS}  of  strategies  into  a  strategy  f  using  Z*-runs  of  Cl  as  follows.  Suppose  r 
is  a  run  of  Cl.  For  any  pePos(G)  let  f(p)=f*(Code  (p))(p)  We  shall  call  f  a  strategy  induced  by 
Cl.  Such  strategies  are  also  called  induced  by  automata  (relative  to  Cp-.^Cg).  Thus  for  any  Cl 

and  { f8 :  seS}  as  above  we  have  a  class  of  strategies  parametrized  by  runs  of  Cl. 

There  is  a  special  class  of  automata  which  we  are  going  to  use  in  constructing  the  strategies.  The 
purpose  of  such  automata  is  to  construct  a  strategy  which  allows  us  to  meet  every  Cj  infinitely  often. 

While  making  moves,  we  are  building  a  run  r  of  Cl  over  the  Code'-image  of  the  play.  At  any 
position  p  of  the  play  our  immediate  goal  is  to  reach  Ch(r(Code'(p)))'  First,  we  wish  to  guarantee 
that  the  goal  is  not  changed  until  it  is  fulfilled.  Second,  we  wish  to  guarantee  that  if  for  the  ran  r  the 
goal  is  reached  infinitely  often,  then  the  play  meets  every  Cj,  i€{l,...,n},  infinitely  often. 

Moreover,  we  would  like  to  insure  a  “fair”®  treatment  of  any  Cj  in  the  sense  that  for  some  fixed 

m€0),  we  should  not  be  able  to  reach  all  the  other  sets  more  then  m  times  in  a  row  without  reaching 
Cj  in  between.  The  following  definition  satisfies  these  requirements. 

Let  Cl  be  a  finite  non-deterministic  Moore  automata  with  the  set  of  states  S,  the  output 
function  h:S-»{l,...,n}  and  the  transition  table  M,  accepting  all  words  in  the  alphabet  {0,1}°.  If  r 
is  a  ran  of  Cl,  a  is  a  word  in  {0,l}n,  b€{0,l}n  and  bj1(t((X))=l,  we  say  that  r  reaches  the 
goal  on  ot’b. 

We  call  Cl  a  priority  automaton  if  its  output  function,  called  the  pdoiity  function  of  Cl, 
satisfies  the  following  two  properties. 

1.  For  any  state  s  and  any  letter  b€{0,I}n,  if  b^s)=0  then  for  any  s  €M(s,b)  h(s)=h(s'); 

2.  There  is  some  man  (which  we  call  a  bound)  s.t.  for  any  ran  r  of  Cl  and  words  otsf3,  if 
r  reaches  its  goal  at  least  m  times  on  {{T  :  oesp'sp}  then  h  takes  all  the  values  from  {l,...,n} 
on  t({p' :  asp'sp}). 

^This  is  similar  to  “fair”  treatment  of  concurrent  processes  in  Computer  Science. 


239 


Theorem  1.  Let  QCT,  Cl  be  a  priority  automata  with  an  output  function  h,  FpDecKC'-Q,^) 
and  <p|(q)e=*q€DomJ(C*-Q,f2).  for  ie{l,...n}  and  Gj,...,Gq  be  such  ^-strategies  that 

1.  e*Q; 

2.  For  i€n  and  any  position  q^QuDomHc‘-Q,Q),  G.  wins  T'  against  Avoid(C*-Q,l-£2). 

Then  for  any  run  r  of  Cl,  the  following  strategy  f  wins  T  from  every  position  p£Q. 

Let  p<q  and  i=h(r(Code'(p))).  If  p€Dom^(C*-Q,G),  define  f(p)=Fj(p)  and  otherwise 

define  f(p)=G^(p).  See  the  Appendix  for  the  notions  of  Dom*,  Deer  and  Avoid. 


Gurevich-Harrington’s  LAR  and  examples  of  non-detennimstic  priority  automata 


The  notion  of  priority  automata  is  a  generalization  of  GH’s  Latest  Appearance  Record  (LAR). 
Our  version  of  LAR  is  a  somewhat  modified  form  of  LAR  from  [GHJ. 

The  alphabet  of  LAR(n)  and  all  other  priority  automata  in  this  section  is  £={0,l}n.  Let 
0rder(n)={s6{l,...,n}*  :  for  all  i€{l,...,n},  i  occurs  in  s  at  most  once}.  Note  that  Order(n) 
includes  the  empty  word  e.  Order(n)  is  going  to  be  the  set  of  states  of  LAR(n).  Let  us  define  the 
transition  table  M. 

Let  cr€l  and  s€Order(n).  Let  X=(i:  apl},  s'e{l,...,n}*  be  an  increasing  sequence  of 
elements  of  X  and  s"  be  the  result  of  crossing  out  of  s  of  all  elements  of  X.  We  define 
M(s,a)=s"*s\  The  output  function  h  is  defined  by  h(e)=0  and  h(s)=First(s)  if  s*e. 
PROPOSITION  3.1.  LAR(n)  is  bounded  by  n. 

■ 

LAR(n)  is  a  deterministic  priority  automaton. 

The  following  priority  automata  are  non-detenninistic  and  both  have  been  discovered  by  game 
theoretic  analysis  of  well  known  concurrent  programs.  MOD(n)  stems  from  Eisenbcrg  and 
McGuire’s  algorithm  (see  [PS])  and  SUB(n)  stems  from  Morris’  algorithm  (see  [MOR]). 

Our  next  priority  automaton  is  designated  MOD(n)  from  “modulo  n”.  Its  set  of  states  is 

{l,...,n}.  The  transition  table  is  defined  as  follows.  For  s€{l,...,n}  and  cr€{o,l}°,  M(s,a)={i} 
if  apO,  M(s,a)={l,...,n}  if  for  all  j€{l,...,n}  apl,  and  M(s,a)={j},  where  j-i  (mod  n)= 
min{k-i  (mod  n) :  ke{l,...,n}},  otherwise.  The  output  function  is  the  identity  function  on 
{!»•--  »n } . 

PROPOSITION  3.2.  MOD(n)  is  bounded  by  n. 

■ 

The  last  example  of  a  priority  automaton  is  designated  SUB(n)  from  “subset  of  n”.  Its  set  of 
states  is  S={<X,i>  :  i€XC{l,...,n}}.  The  transition  table  is  defined  as  follows.  For  s=<X,i>eS 

and  cr€{o,l}n,  let  X'={j€X:  apO}.  Then  M(s,o)={<X'4>}  if  ap 0,  M(s,a)={<X'j> : 
j€X'}  if  X'*0  and  Opl,  and  M(s,a)={<{l,...,n}  j>  :  je{l,...,n}}  otherwise.  The  output 
function  is  h(<X,i>)=i. 

PROPOSITION  3.3.  SUB(n)  is  bounded  by  2*n. 
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4.  THE  MUTUAL  EXCLUSION  PROBLEM 


In  Pait  1  of  the  sequel  (see  the  abstract),  the  winning  condition  is  represented  by 

W=([MmEidn(n.€nIN<2L2£l£i]))- 

Since  this  set  is  exactly  of  the  form  used  in  Theorem  1,  it  is  easy  to  see  that  the  theorem  provides 
a  class  of  winning  strategies  for  this  problem. 

Appendix.  RANKS,  DOMAINS  AND  STRATEGIES  BASED  ON  THEM 

Let  A  be  a  finite  alphabet  and  T£A*  be  a  tree  without  leaves.  We  call  T  a  game  tree  and  its 
elements  poardoas.  If  a  position  p  is  a  prefix  of  a  position  q  we  shall  write  psq. 

We  shall  consider  games  with  the  following  rules.  0  and  l  are  respectively  first  and  second 
players.  If  p  is  a  position  with  even  (odd)  length  then  0  (1)  chooses  a  letter  aeA  s.t.  p-aeT.  We 
designate  the  set  of  all  even  (odd)  positions  as  Pos(0)  (Pos(l)).  A  play  is  an  infinite  sequence 
produced  by  the  above  rules.  The  collection  of  all  plays  is  designated  Play(T). 

Let  X£T  and  ft  be  a  player.  A  position  q  is  a  child  of  p  if  q=p*a  for  some  aeA. 

Denote  by  p-j-  the  set  of  children  of  p.  A  non— deterministic  strategy  for  ft  is  a  function 

f:Pos(ft)-»lP(T)  s.t.  for  any  peT  f(p)  is  a  subset  of  the  set  of  children  of  p.  From  now  on  a 
‘strategy’  means  a  ‘ non-deterministic  strategy’,  unless  we  say  otherwise.  The  two  most  simple 
strategies  are  called  vacuous  they  arc  defined  by  Vac(ft)(p)=pj  and  Vac(l-ftXq)=qj.  We  say 

that  a  strategy  l  is  defined  on  XCT  if  for  any  p  in  X  f(p)*0  and  for  any  p  outside  X 
f(p)=0.  For  example  Vac(ft)  is  defined  on  Pos(ft)  and  Vac(l-n)  is  defined  on  Pos(l-ft).  We 
say  that  a  strategy  f  is  defined  at  least  on  X£T  if  for  any  p  in  X  f(p)*0  without  any 
supposition  for  the  behavior  of  f  outside  X.  We  say  that  an  ft -strategy  f  is  a  reSnement  of  f  if 
for  all  p€Pos(ft)  f(p)£f(p).  Informally,  we  shall  write  fCf.  The  set  of  consistent  with  f 
positions  is  defined  as  follows.  The  empty  word  e  is  consistent  with  f.  For  any  consistent  with  f 
position  p  if  pePos(l-ft)  then  any  child  of  p  is  consistent  with  f  and  if  pePos(ft)  then  any 
q€f(p)  is  consistent  with  f.  Consistency  after  position  pq  is  defined  by  replacing  e  above  by  pq. 

Consistency  of  plays  is  defined  similarly.  Sometimes  we  shall  informally  use  words  ‘reach’,  ‘meet’ 
and  so  on  instead  of  speaking  in  terms  of  consistency. 

All  the  strategies  considered  here  are  based  on  the  following  concept  of  “ft-rank  inside  an  ft- 
strategy  and  against  a  (1 -ft  ^strategy”.  Let  f  bean  ft -strategy  and  g  be  a  (l-ft)-strategy  and 
XCT.  We  shall  inductively  define  a  partial  function  Rank(X,ft,f/g):T-»u> '  as  follows. 

1.  For  all  p€X,  Rank(X,ft,f/g)(p)=0; 

2.  If  p€Pos(ft),  Rank(X,ft,f/g)  is  defined  on  at  least  one  child  of  p  from  f(p),  and  n  is 
the  minimal  value  of  Rank(X,ft,f/g)  on  f(p),  then  Rank(X,ft,f/g)(p)=n+l; 

3.  If  p€Pos(l— ft),  g(p)*0,  Rank(X,ft,f/g)  is  defined  on  all  children  of  p  which  are  in 
g(p),  and  n  is  the  maximal  value  of  Rank(X,ft,f/g)  on  g(p),  then  Rank(X,ft  ,f/g)(p)=n+ 1 . 

PROPOSITION  !. I .  For  any  XCT,  p€T  and  nsO  Rank(X,ft,f/g)(p)=n  iff  there  is  a 
strategy  for  ft  which  is  a  refinement  of  f  and  which  allows  to  reach  X  starting  from  p  with  at 
most  n  moves  while  1-ft  uses  g  after  p. 
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GH’s  original  notion  of  “rank”  can  be  represented  in  our  notation  in  the  form 
Rank(X,ft,Vac(£2)/Vac(l-C2)).  The  idea  of  playing  against  a  fixed  strategy  of  the  opponent,  they 
explored  in  their  “Sewing  Lemma”.  In  contrast,  we  do  not  use  the  “Sewing  Lemma”,  but  rely  on 
modified  GH’s  “domains”  based  on  “rank  for  inside  an  ^-strategy  and  against  a  (1-ft)- 
strategy”. 

It  may  appear  at  first  that  introduction  of  f  and  g  into  ranks  and  the  subsequent  notions  is 
redundant  since  f  and  g  may  form  a  subtree  of  T  and,  in  this  subtree,  our  rank  is  just  the  GH’s 
rank.  However  closer  observation  shows  that  since  f  and  g  are  not  necessarily  everywhere  defined, 
i.e.  we  allow  f(p)=0  or  g(p)=0,  f  and  g  may  not  necessarily  form  a  subtree.  Rather,  they  may 
form  many  disjoint  subtrees  and  finite  parts  of  subtrees.  Since  we  would  like  to  define  uniform 
strategies  in  areas  where  f  and  g  are  defined,  the  notion  of  rank  in  full  generality  appears  to  be 
useful. 

We  designate  Dom(X,£2,f/g)={peT  :  Rank(X,£2,f/g)(p)aO]  and  Dom1(X,n,f/g)={peT : 
Rank(X-{p}  ,n/gXp)2l }  • 

PROPOSITION  1.2.  Dom1(X,f2,f/g)CDom(X,n,f/g), 

Dom^(X,f2,f/g)uX=Dom(X,fl,f/g),  Dom(Dom(X,ft,f/g),n,f/g)=Dom(X,ft,f/g)  and 
Dom(DomHx,f2,f/g),Q,f/g)=Dom^(X,n,f/g). 

■ 

The  following  strategies  are  used  as  building  blocks  for  all  the  strategies  considered  here. 
Decr(X,ft,f/g)(p)={q:  q€f(p)  and  Rank(X-{p},C2,f/g)(q)sRank(X-{p},£2,f/g)(p)}  is  an  Q- 
strategy.  Note  that  if  q  descends  from  p  then  Rank(X- { p}  ,f/g)(q)=Rank(X,ft ,f/g)(q). 
Avoid(X,l-0,g/f)(p)={q :  q€g(p)  and  and  q£Dom(X,£2,f/g)}  is  a  (l-f2)-strategy. 

PROPOSITION  1.3.  If  p€Dom1(X,J2,f/g)  then  Decr(X,n,f/g)  allows  to  reach  X  after  p 
if  1— ST2  uses  g  after  p.  Moreover  while  Q  is  using  Decr(X,Q,f/g)  and  1-ft  is  using  g,  the 

play  stays  inside  Dom^(X,f2,f/g)  at  least  until  it  reaches  X  first  time  after  p.  If 
p^Dom*(X,£2,f/g)  then  Avoid(X,l-n,g/f)  allows  to  never  reach  Dom(X,Q,f/g)  (and  hence 
also  X)  after  p  if  ST2  uses  f  after  p. 
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UNSTABLE  INTERFACES  AND  ANOMALOUS  WAVES 
IN  COMPRESSIBLE  FLUIDS 

JOHN  W.  GROVEf,  RALPH  MENIKOFFJ  QIANG  ZHANG§ 


Abstract.  The  gravitational  acceleration  of  a  heavier  fluid  into  a  lighter  fluid  causes 
unstable  modes  to  grow  in  the  interface  between  the  two  fluids  and  leads  to  their 
eventual  chaotic  mixing.  This  phenomena  is  known  as  the  Rayleigh-Taylor  instability. 
We  discuss  the  development  and  validation  of  a  model  for  the  long  term  dynamics  of 
the  mixing  boundary  layer.  The  model  uses  a  simplied  description  of  the  dynamics 
of  the  bubbles  of  lighter  fluid  rising  in  the  heavier  fluid.  Validation  of  the  model  is 
achieved  by  comparison  with  experiments  and  full  scale  two  dimensional  simulations 
of  the  mixing  process. 

We  also  discuss  the  production  of  anomalous  waves  during  the  interaction  of  shock 
waves  with  fluid  interfaces.  The  focus  here  is  on  the  case  when  the  shock  passes  from 
a  medium  of  high  to  low  acoustic  impedance.  Curvature  of  either  of  the  interacting 
waves  causes  the  diffraction  patterns  produced  during  the  collision  to  bifurcate  from 
locally  self-similar  pseudo-stationary  configurations  to  unsteady  anomalous  reflec¬ 
tions.  This  process  is  analogous  to  the  transition  from  a  regular  to  a  Mach  reflection 
where  the  reflected  wave  is  a  rarefaction  instead  of  a  shock.  These  bifurcations  are 
incorporated  into  a  front  tracking  code  that  gives  an  accurate  description  of  the  wave 
interactions.  Numerical  results  for  two  illustrative  cases  are  described;  a  planar  shock 
passing  over  a  bubble,  and  an  expanding  shock  impacting  a  planar  contact. 


1.  Introduction. 

This  report  treats  two  aspects  of  computational  fluid  dynamics,  the  unstable 
behavior  of  gravity  driven  mixing,  and  the  diffraction  of  shock  waves  through  fluid 
interfaces.  The  latter  process  is  itself  associated  with  a  related  instability,  known 
as  the  Richtmyer- Meshkov  instability,  that  occurs  in  shock  accelerated  interfaces. 

The  Rayleigh-Taylor  instability  is  a  fingering  instability  between  two  fluids  with 
different  densities.  If  the  interface  between  the  two  fluids  is  planar  and  perpen¬ 
dicular  to  the  direction  of  the  applied  external  forces,  then  such  a  system  is  in  a 
state  of  unstable  equilibrium  when  the  light  fluid  supports  the  heavier.  Any  small 
perturbation  of  the  fluid  interface  will  upset  this  unstable  equilibrium  leading  to 
the  formation  of  rising  bubbles  of  the  light  fluid  and  falling  spikes  of  the  heavier. 
As  the  mixing  process  develops,  spikes  can  pinch  off  to  form  droplets. 
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The  mixing  of  two  fluids  Under  the  influence  of  gravity  was  first  investigated  by 
Rayleigh  [26]  and  later  by  Taylor  [31].  Since  then  a  variety  of  computational  and 
analytic  methods  have  been  used  to  study  this  classical  problem.  These  include; 
nonlinear  integral  equations  [4],  [  7]  boundary  integral  techniques  [33]  conformal 
mapping  [21],  dynamical  modeling  [12],  [  30],  vortex-in-cell  methods  [32],  [  36] 
higher  order  Godunov  methods  [34],  and  front  tracking  [8],  [  11],  [  13].  Most  of  this 
work  has  been  carried  out  for  incompressible  fluids  or  in  the  limit  of  a  single  fluid 
in  a  vacuum.  For  a  review  of  the  Rayleigh- Taylor  instability  and  its  applications 
to  science  and  engineering,  see  reference  [29].  We  will  present  here  results  on  the 
behavior  of  a  single  unstable  mode  (one  bubble),  as  well  as  the  interaction  of  multiple 
bubbles. 

The  front  tracking  method  was  used  for  the  direct  simulation  of  the  mixing  pro¬ 
cess.  We  conducted  a  series  of  computational  experiments  for  periodic  arrays  of 
single  bubbles  (the  single  mode  case)  as  well  as  for  multiple  bubble  interactions. 
Tracking  the  fluid  interface  offered  several  advantages,  it  eliminated  numerical  dif¬ 
fusion  at  the  interface,  and  it  allowed  an  accurate  measurement  to  be  made  of  the 
interface  velocity. 

Our  analysis  consisted  of  modeling  the  motion  of  the  tip  of  a  spike  or  bubble 
in  a  single  mode  system  by  an  ordinary  differential  equation,  and  applying  these 
results  to  the  interaction  of  multiple  bubbles.  We  found  that  in  a  chaotic  flow 
the  interaction  between  the  different  bubbles  causes  the  magnitude  of  the  terminal 
velocity  of  a  large  bubble  to  be  greater  than  that  predicted  by  the  single  bubble 
theory.  This  led  us  to  formulate  a  superposition  model  in  which  larger  bubbles 
“capture”  the  velocity  of  nearby  smaller  bubbles.  We  found  agreement  between  the 
velocities  predicted  by  this  simplified  model  and  those  obtained  by  direct  numerical 
simulations,  although  the  agreement  is  better  for  large  Atwood  numbers  and  low 
compressibility  than  in  the  opposite  case. 

The  second  part  of  this  report  treats  the  diffraction  patterns  produced  by  the 
collision  of  a  shock  wave  with  a  fluid  interface.  This  process  produces  a  variety  of 
complicated  wave  diffractions  [1],  [  2],  [  IS].  In  the  simpliest  case  these  consist  of 
pseudo-stationary  self-similar  waves  that  can  be  described  by  solutions  to  Riemann 
problems  for  the  supersonic  steady-state  Euler  equations.  In  more  complicated 
cases  and  in  particular  when  one  or  both  of  the  colliding  waves  is  curved,  these 
regular  diffraction  patterns  can  bifurcate  into  complex  composites  of  individual 
wave  interactions  between  the  scattered  waves. 

The  goal  here  is  to  understand  the  particular  bifurcation  behavior  of  the  collision 
of  a  shock  in  a  dense  fluid  with  an  interface  between  the  dense  fluid  and  a  much 
lighter  one.  Two  basic  cases  are  considered.  The  collision  of  a  shock  in  water  with 
a  bubble  of  air,  and  the  diffraction  of  a  cylindrically  expanding  underwater  shock 
wave  with  the  water’s  surface.  It  will  be  seen  that  initially  these  interactions  pro¬ 
duce  regular  shock  diffractions  with  reflected  Prandtl-Meyer  waves.  Subsequently 
these  regular  waves  bifurcated  to  form  anomalous  waves  that  are  analogous  to  non- 
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centered  Mach  reflections  whose  reflected  waves  are  rarefactions.  We  will  describe  a 
method  to  include  this  analysis  into  a  front  tracking  numerical  method  that  allows 
enhanced  resolution  computations  of  these  interactions. 

2.  The  Equations  of  Motion. 

In  the  absence  of  heat  conduction  and  viscosity,  fluid  flow  is  governed  by  the  Euler 
equations  that  describe  the  laws  of  conservation  of  mass,  momentum  and  energy 
respectively. 

(2.1a)  dtp  +  V  •  (pq)  =  0, 

(2.1b)  dt(pq)  +  V  •  (pq  ®  q)  +  VP  =  pg, 

(2-lc)  dt(pS)  +  V  ■  pq(S  +  VP)  =  pq  •  g. 

Here,  p  is  the  mass  density,  q  is  the  particle  velocity,  g  is  the  gravitational  acceler¬ 
ation,  £  =  j|qj2  +  E  is  the  total  specific  energy,  E  is  the  specific  internal  energy, 
and  P  is  the  pressure.  The  equilibrium  thermodynamic  pressure  P(V,E),  where 
V  =  1/p  is  the  specific  volume,  is  referred  to  as  the  equation  of  state  and  describes 
the  fluid  properties.  The  numerical  examples  below  used  either  the  polytropic  equa¬ 
tion  of  state, 

(2-2)  P(V1E)  =  (y-l)pE  , 

or  the  stiffened  polytropic  equation  of  state,  [16],  [  25] 

(2-3)  P(V,  E)  =  r 0p(E  -  Poo)  -  (To  +  l)Poo  , 

where  7,  T0,  l-E^  and  Poo  are  positive  constants.  In  particular,  all  of  the  Rayleigh- 
Taylor  simulations  used  a  polytropic  equation  of  state  with  7  =  1.4. 

System  (2.1)  is  hyperbolic  with  characteristic  modes  corresponding  to  the  prop¬ 
agation  of  sound  waves  and  fluid  particles  through  the  medium.  The  sound  waves 
propagate  in  all  directions  from  their  source  with  a  sound  speed  c  with  respect  to 
the  fluid,  where  c2  =  dP/dp  at  constant  entropy.  Another  important  measure  of 
sound  propagation  is  the  Lagrangian  sound  speed  or  acoustic  impedance  given  by 
pc. 

3.  Motion  of  single  mode  bubbles  and  spikes. 

For  a  given  equation  of  state,  the  two  fluid  mixing  problem  is  characterized  by 
two  dimensionless  quantities.  The  first  of  these  is  a  relative  measure  of  the  dif¬ 
ference  in  densities  between  the  two  fluids,  the  Atwood  number  A  =  p>y  -PL  The 

Ph+Pt 

second  measures  the  compressibility  of  the  heavy  fluid.  If  A  is  the  wavelength  of  the 
perturbation  and  Ch  is  the  speed  of  sound  in  the  heavy  fluid,  we  define  the  dimen¬ 
sionless  compressibility  to  be  C2  =  7^ .  Our  goal  is  to  study  the  overall  behavior 
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Fig.  1  Plots  of  the  interface  position,  and  density  and  pressure  con¬ 
tours  for  A  =  1/5,  C2  =  0.5,  and  7  =  1.4  in  a  1  x  6  computation 
domain  with  a  40  x  240  grid.  Only  the  upper  two  thirds  of  the 
computational  region  is  shown  in  the  plot  because  nothing  of  in¬ 
terest  occurs  in  the  remainder  of  the  computation.  The  interface 
position  for  successive  time  steps  is  shown  in  (a)  while.  ( b )  and 
(c)  show  contours  of  density  and  pressure  respectively.  Gravity  is 
directed  downward. 

of  the  unstable  mixing  between  the  two  fluids  for  a  range  of  Atwood  numbers  and 
compressibilities. 

For  a  polytropic  equation  of  state,  the  equilibrium  solution  of  the  Euler  equations 
is  an  exponentially  stratified  distribution  of  density  and  pressure  along  the  direction 
of  gravity.  We  used  the  solution  to  a  linearized  perturbation  of  this  equilibrium 
solution  [3],  [  8]  to  provide  the  Cauchy  data  for  a  full  Euler  simulation.  Here  we 
consider  the  single  mode  system,  which  is  a  periodic  array  of  bubbles  and  spikes. 
The  top  and  bottom  of  the  computational  domain  axe  reflecting  boundaries. 

Figs.  1  and  2  show  computational  results  for  two  different  simulations  with  C2  = 
0.5  and  A  =  j  and  -£7  respectively.  If  the  Atwood  number  is  small  (Fig.  1),  two 
interpenetrating  fingers  of  similar  shape  axe  formed  with  secondary  instabilities 
appearing  along  the  side  of  the  spike.  As  A  — ►  0,  the  pattern  of  the  two  fluids 
becomes  symmetric  with  a  phase  difference  tt.  For  laxger  Atwood  numbers  (Fig.  2), 
the  spike  is  thinner  with  less  roll  up  shed  off  the  edge  of  its  tip.  If  the  compressibility 
is  high,  the  velocity  of  the  bubble  or  spike  will  eventually  become  supersonic  relative 
to  the  heavy  material  but  will  remain  subsonic  in  the  light  material.  We  refer  to 
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Fig.  2  Plots  of  the  interface  position,  density  and  pressure  contours 
for  A  =  0.01,  C2  =  0.5,  and  7  =  1.4  in  a  1  x  10  computation  domain 
with  a  20  x  200  grid.  Only  the  upper  four  fifths  of  the  computational 
region  is  shown  in  the  plot  because  nothing  of  interest  occurs  in  the 
remainder  of  the  computation.  The  interface  position  for  successive 
time  steps  is  shown  in  (a)  while,  (b)  and  (c)  show  contours  of 
density  and  pressure  respectively.  Gravity  is  directed  downward. 


[8]  for  the  details  of  these  studies. 

A  bubble  or  spike  that  arises  from  a  small  amplitude  disturbance  goes  through 
three  regimes;  an  initial  stage  governed  by  the  linearized  equations,  a  period  of  free 
fall,  and  a  final  terminal  velocity  phase.  During  the  linear  stage,  the  velocity  grows 
exponentially  with  time.  We  denote  this  growth  rate  by  a.  In  the  free  fall  regime  the 
velocity  varies  linearly  with  time  and  the  acceleration  reaches  a  maximum  absolute 
value  called  the  renormalized  gravity  gR.  Finally  the  velocity  approaches  a  limiting 
value  (terminal  velocity  Uoo)  with  a  decay  rate  b.  These  three  regimes  are  illustrated 
in  Fig.  3  which  shows  plots  of  the  spike  velocity  and  acceleration  verses  time. 

By  using  curve  fitting  through  the  three  growth  regimes  of  the  spike  or  bubble, 
it  is  possible  to  describe  its  motion  the  ordinary  differential  equation 


(3.1) 


dv_  = _ _ 
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Fig.  3  The  comparison  of  the  spike  velocity  and  the  spike  acceler¬ 
ation  obtained  numerically  with  the  asymptotic  behavior  in  each 
regime.  A  =  1/3,  C2  =  0.5  and  7  =  1.4.  The  solid  lines  are 
the  numerical  results  obtained  by  using  a  80  x  640  grid  in  a  1  x  8 
computational  domain. 


which  has  solution 
(3.2)  t  —  i0  =  - 

(T 


'"(r  > + 17- -  l~h  +  -  i,B< 

Uo  9R  y/b  Voo  0 


1 


^OO  -  Vt  j 

Voo-  Vq 


The  first  term  in  (3.2)  is  the  contribution  from  the  linear  regime,  the  second  is 
that  of  the  free  fall  regime,  and  the  third  comes  from  the  asymptotic  terminal 
velocity.  Extensive  validation  of  this  model  has  been  performed  for  a  range  of 
Atwood  numbers  and  compressibilities.  The  dependency  of  <7,  qr,  b  and  Uqq  on 
.4  and  C  is  described  in  [35] .  Fig.  4  shows  a  comparison  between  a  numerical 
simulation  of  the  full  two  dimensional  Euler  equations  and  the  curve  given  by  (3.2). 

From  a  dimensional  argument,  the  terminal  velocity  of  the  bubble  should  be 
proportional  to  \f\g,  where  the  constant  of  proportionality  cy  only  depends  on  the 
dimensionless  parameters  A ,  C  and  7.  Fig.  5  shows  a  plot  of  cy  for  a  range  of 
Atwood  numbers  and  compressibilities.  We  see  that  cy  depends  strongly  on  C  and 
for  small  fixed  values  of  C  is  approximately  VA.  We  did  not  explore  the  dependence 
of  cj  on  7  in  this  study. 

4.  Interaction  between  bubbles. 

Vlultiple  bubble  interactions  are  initialized  with  an  ensemble  of  bubbles  of  dif- 
fe.rnt  wavelengths.  When  started  at  small  amplitudes,  shorter  wavelength  bubbles 
ha -e  higher  growth  rates  than  the  larger  bubbles.  However  the  short  wavelength 
bu  >bles  saturate  out  at  smaller  terminal  velocities  than  the  larger  ones.  Thus  while 
the  small  bubbles  initially  run  faster,  the  larger  ones  catch  up  and  overtake  them 
emerging  on  the  outer  envel  4>e  of  the  interface  between  the  fluids.  It  was  discov¬ 
ered  that  bubble  interaction  causes  the  terminal  velocity  of  the  large  bubbles  to 
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Fig.  4  Plots  of  spike  velocity  and  bubble  velocity  versus  time  su¬ 
perimposed  over  the  best  three  parameter  fit  to  the  solution  of  the 
ODE  model.  The  parameter  values  are  A  =  1/3,  C2  =  0.5,  and 
7  =  1.4  .  The  numerical  results  are  obtained  by  using  a  80  x  640 
grid  in  a  1  x  8  computational  domain. 


0.0  1.0 


Fig.  5  The  dependence  of  c\  on  A  and  C.  Note  that  c\  has  a  strong 
dependence  on  C.  For  a  given  value  of  C 2  the  dependence  on  A  is 
approximately  \J~A  in  systems  of  low  compressibility.  The  value  of 
Ci  for  an  incompressible  fluid  (C2  =  0)  is  taken  from  reference  [21]. 

exceed  the  prediction  based  on  the  single  bubble  theory  for  a  bubble  of  comparable 
wavelength.  As  a  large  bubble  overtakes  a  smaller  one,  it  absorbs  the  velocity  of 
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Fig.  6  Plots  of  interfaces  in  a  random  disturbance  simulation  of 
the  Rayleigh-Taylor  instability.  The  density  ratio  is  A  =  1/3  and 
the  compressibility  is  C2  =  0.1.  The  acceleration  of  the  bubble 
envelope  is  in  good  agreement  with  the  experiment  of  Read  for  1.5 
generations  of  bubble  merger.  The  acceleration  decreases  after  this 
time  due  to  the  multiphase  connectivity,  which  is  different  in  the 
exactly  two  dimensional  computation  from  the  approximately  two 
dimensional  experiments.  Gravity  is  directed  downward. 

that  bubble  which  in  turn  is  washed  away  downstream.  We  call  this  process  bubble 
merger  since  it  reduces  the  number  of  bubbles  in  the  outer  envelope.  This  number  is 
reduced  by  a  factor  of  2n  after  n  generations  of  bubble  merger,  a  phenomenon  that 
was  observed  in  the  experiments  of  Read  [27]  as  well  as  in  our  numerical  simulations 
[11].  The  interface  configuration  of  a  random  multiple  bubble  system  is  shown  in 
Fig.  6.  We  see  that  the  small  structures  (bubbles)  merge  into  large  structures. 

We  propose  a  simple  superposition  model  for  the  bubble  velocity  in  the  chaotic 
regime.  The  basic  idea  is  to  treat  the  envelope  of  the  bubbles  as  a  single  bubble  of 
long  wavelength.  The  velocity  of  individual  bubbles  as  well  as  the  bubble  envelope 
are  first  computed  based  on  the  single  bubble  theory,  the  hypothesis  is  that  to 
leading  order  the  total  velocity  of  each  bubble  is  the  sum  of  its  single  bubble  theory 
velocity  and  the  velocity  of  the  envelope.  More  advanced  bubbles  are  in  phase  with 
this  envelope  so  the  superposition  is  constructive  and  their  velocity  is  increased. 
On  the  other  hand  a  less  advanced  bubble  is  out  of  phase  with  the  envelope  causing 
its  net  velocity  to  be  decreased.  During  the  initial  small  amplitude  regime,  the 
envelope’s  longer  wavelength  causes  its  velocity  to  be  dominated  by  the  individual 
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Fig.  7  Successive  times  in  a  two  bubble  merger  process.  The  com¬ 
pressibility  and  Atwood  number  for  this  case  are  C2  =0.1  and 
A  =  2/3  respectively.  It  can  be  seen  that  the  large  bubble  over¬ 
takes  the  smaller  one  at  gt/c^  =  1.0.  The  velocity  of  the  large 
bubble  is  accelerated  during  the  merger  while  the  velocity  of  the 
small  bubble  is  reversed,  see  Fig.  8. 

bubbles,  but  at  later  times  the  envelope  velocity  is  the  main  contribution  to  the 
bubble  velocity. 

We  compared  the  results  of  this  superposition  model  with  the  experimental  results 
of  Read  [27]  and  our  numerical  simulations  of  the  full  Euler  equations.  The  relative 
error  between  the  superposition  theory  and  the  experimental  or  computational  data 
was  less  than  20%  for  systems  with  .4  >  |  and  C2  <  .1,  and  about  30%  for  systems 
with  small  Atwood  numbers  or  large  compressibility.  In  the  latter  case,  the  density 
stratification  of  the  fluids  cause  the  superposition  principle  to  break  down  in  finite 
time  [11]. 

Fig.  7  shows  the  interface  between  two  fluids  at  successive  times  in  a  two  bubble 
merger  process  and  Fig.  8  shows  a  comparison  of  the  velocities  of  these  bubbles  and 
the  predictions  obtained  from  the  superposition  model.  The  behavior  of  the  small 
bubble  velocity  clearly  indicates  the  contribution  from  the  envelope.  At  first  the 
single  mode  bubble  velocity  dominates  since  the  envelope  has  a  small  growth  rate. 
The  bubble  stops  accelerating  when  the  small  bubble  and  the  envelope  have  equal 
but  opposite  velocities.  After  that,  the  envelope  velocity  dominates  and  the  small 
bubble  de- accelerates  and  is  washed  away  downstream. 

It  would  be  expected  from  the  existence  of  a  terminal  velocity  in  the  single  bubble 
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'  Fig.  8  A  plot  of  bubble  velocities  vs.  time  for  the  two  bubble  merger 
simulation.  The  small  bubble  is  accelerated  at  the  beginning  and 
is  then  decelerated  after  about  gt/c\  =  0.42.  The  small  bubble  is 
washed  downstream  after  its  velocity  is  reversed,  while  the  large 
bubble  is  under  constant  acceleration.  The  smooth  curves  represent 
the  bubble  motion  as  predicted  by  the  superposition  model. 

theory  that  the  asymptotic  position  of  a  bubble  would  be  proportional  to  time. 
However  for  a  chaotic  flow  interactions  cause  the  radius  of  a  large  bubble  to  increase, 
consequently  raising  its  terminal  velocity.  This  led  to  the  prediction  that  position 
of  the  tip  of  a  large  bubble  is  proportional  to  t2,  z  —  aAgt2.  In  his  experiments. 
Read  [27]  reported  a  range  of  values  for  a,  a  typical  value  being  a  =  .06.  Youngs 
[34]  and  Zufuria  [36]  reported  values  of  o  ranging  from  0.04  ~  0.05  and  0.05  ~  0.06 
respectively,  based  on  their  numerical  simulations.  Our  simulations  indicated  that 
a  is  not  a  constant.  Rather  it  varies  during  the  interaction  from  an  early  value  of 
0.055  ~  0.065  to  0.038  ~  0.044  at  late  stages  of  the  interaction  [11].  The  reduction 
of  a  from  about  .06  to  about  .04  is  due  to  the  multi-connectivity  of  the  interface 
in  the  deep  chaotic  regime.  In  Young’s  numerical  simulations  [34],  the  interface 
between  two  fluids  was  not  tracked  so  that  effective  multi-connectivity  occurred  in 
the  early  stages  of  his  simulations.  This  may  explain  the  small  values  of  a  which 
be  observed.  The  discrepancy  between  the  value  of  a  at  late  times  in  our  numerical 
simulations  and  the  value  observed  in  Read’s  experiments  results  from  the  difference 
between  an  exact  two  dimensional  numerical  simulation  and  an  approximately  two 
dimensional  experiment.  In  Read’s  experiments  the  ratio  of  width  to  thickness  was 
six  to  one,  and  the  isolated  segments  of  fluids  in  the  x  —  z  plane  for  the  computations 
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might  be  connected  in  the  third  dimension  (y  direction).  Such  discrepancies  may 
be  resolved  in  three  dimensional  calculations  which  will  provide  a  more  realistic 
approximation  of  the  experimental  conditions. 

When  compressibility  effects  are  significant,  the  stratification  of  the  density  in 
the  unperturbed  fluid  causes  the  effective  Atwood  number  at  the  tip  of  a  bubble 
to  decrease  as  the  bubble  moves  into  the  heavy  fluid.  The  bubble  velocity  is  non¬ 
monotone  and  may  even  reverse  directions.  Since  this  factor  was  not  taken  into 
account  in  the  single  mode  theory,  our  superposition  theory  breaks  down  when  the 
effective  Atwood  number  has  been  substantially  reduced.  To  get  a  better  under¬ 
standing  of  the  phenomenon  of  velocity  turnover  in  a  single  mode  system  and  the 
failure  of  the  superposition  hypothesis  in  a  multi-mode  system,  we  use  the  initial 
density  distribution  of  light  and  heavy  fluid  to  approximate  the  effective  dynamic 
Atwood  number  Ae.  For  a  flat  interface,  the  density  distribution  is 

(4.1)  pi(z)  =  pi(0 )exp(^p),  i  =  l,h  . 

When  a  bubble  reaches  the  position  z,  we  approximate  the  effective  Atwood  number 
as 


Ph(z )  -  Pi(z)  _  (1  +  A)exp('yC2 -  (1  -  A) 
Ph{z)  +  Pl(z)  (1  +  A)exp(^C2-^\ )  +  (1  -  A) 


For  a  single  mode  system,  the  turnover  phenomenon  should  occur  before  the  effective 
Atwood  number  Ae  vanishes.  For  a  multi-mode  system,  the  superposition  model 
is  applicable  as  long  as  Ae  w  A  =  Ae(z  =  0).  In  Fig.  9,  we  plot  the  approximate 
effective  Atwood  number  verses  f.  Since  Ae  decreases  more  rapidly  in  a  system 
with  a  small  Atwood  number  or  large  compressibility,  the  superposition  model  fails 
at  a  small  value  of  j  in  these  systems. 

One  should  not  confuse  the  turnover  of  the  bubble  velocity  in  a  single  mode  system 
with  the  turnover  of  the  velocity  of  a  small  bubble  in  the  multi-mode  system  .  The 
former  is  due  the  stratified  density  distribution  and  latter  is  due  to  the  interactions 
between  bubbles,  i.e.,  the  contribution  of  the  envelope  velocity  to  the  total  velocity 
of  the  small  bubble. 

5.  Elementary  Wave  Nodes  and  the  Supersonic  Steady  State  Riemann 
Problem. 

We  now  turn  our  attention  to  an  investigation  of  wave  interactions  between  shock 
waves  and  fluid  interfaces.  An  elementary  wave  node  is  a  point  of  interaction 
between  two  waves  that  is  both  stationary  and  self-similar  [14].  Gravity  will  be 
neglected  here  since  the  interactions  considered  occur  on  short  time-scales.  It  can 
be  shown  [10],  [  19^405-409]  that  there  are  four  basic  elementary  nodes.  These  are 
the  crossing  of  two  shocks  moving  in  opposite  directions  (cross  node),  the  overtaking 


Fig.  9  The  plot  of  approximate  effective  Atwood  number  as  the 
bubble  reaches  position  z.  At  decreases  more  rapidly  in  the  system 
with  small  initial  Atwood  number  or  large  compressibility  than  in 
the  system  with  large  initial  Atwood  number  and  small  compress¬ 
ibility.  The  decreasing  of  the  effective  Atwood  number  is  the  source 
the  turnover  phenomenon  in  single  mode  system  and  the  failure  of 
the  superposition  model  in  multi-mode  systems. 

of  one  shock  by  another  moving  in  the  same  direction  (overtake  node),  the  collision 
of  a  shock  with  a  fluid  interface  (diffraction  node),  and  the  splitting  of  a  shock  wave 
due  to  interaction  with  other  waves  or  boundaries  to  produce  a  Mach  reflection 
(Mach  node).  All  of  these  waves  axe  characterized  by  the  solution  of  a  Riemann 
problem  for  a  steady  state  flow,  where  the  data  is  provided  by  the  states  behind  the 
interacting  waves.  We  will  primarily  be  concerned  with  the  diffraction  node,  but 
bifurcations  in  this  node  will  lead  to  the  production  of  ail  of  the  other  elementary 
nodes. 

For  a  stationary  planar  flow,  system  (2.1)  reduces  to  a  4  x  4  system  that  is 
hyperbolic  in  the  restricted  variables  provided  the  Mach  number  M  =  |q|/c  is 
greater  than  one,  i.e.,  the  flow  is  supersonic.  The  streamlines  or  particle  trajectories 
define  the  time-like  direction.  The  hyperbolic  modes  in  this  case  are  associated 
with  two  families  of  sound  waves,  and  a  linearly  degenerate  double  characteristic 
family.  If  9  and  q  are  the  polar  coordinates  of  the  particle  velocity  q,  then  the  sonic 
waves  have  characteristic  directions  with  polar  angles  9  ±  A,  where  .4  is  the  Mach 
angle,  sin  .4  =  M~l .  Waves  of  these  families  are  either  stationary  shock  waves 
or  steady  state  centered  rarefaction  waves  called  Prandtl-Meyer  waves.  Waves  of 
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the  degenerate  family  are  a  combination  of  a  contact  discontinuity  and  a  vortex 
sheet  across  which  the  pressure  and  flow  direction  6  are  continuous  while  the  other 
variables  may  experience  jumps. 

Following  the  general  analysis  of  systems  of  hyperbolic  conservation  laws  [20], 
we  see  that  the  wave  curve  for  a  sonic  wave  family  consists  of  two  branches  cor¬ 
responding  to  either  a  shock  or  a  simple  wave.  The  shock  branch  is  commonly 
called  a  shock  polar  [6^294-317]  and  forms  a  closed  and  bounded  loop  where  the 
two  sonic  families  meet  at  the  point  where  the  stationary  shock  is  normal  to  the 
incoming  flow.  If  we  let  the  state  ahead  of  the  wave  be  denoted  by  the  subscript 
0,  a  straightforward  derivation  of  the  Rankine-Hugoniot  equations  for  the  system 

(2.1)  shows  that  the  thermodynamics  of  the  states  on  either  side  of  the  shock  axe 
related  by  the  Hugoniot  equation 

(5.1)  E  =  Eo+  -  V). 

A  similar  derivation  applied  to  the  steady  state  Euler  equations  shows  that  the  flow 
velocities  on  either  side  of  a  stationary  oblique  shock  satisfy 

(5.2)  +  H  =  1,2  +  , 

where  H  =  E  +  PV  is  the  specific  enthalpy.  The  jump  in  the  flow  direction  is  given 
by 

(5.3)  tan(0  -  90)  =  ±  /  ( p  \  cot  f3  • 

L/Mo  ~\P  -  Po)\ 

Here  j3  is  the  angle  between  the  incoming  streamline  and  the  shock  wave,  and  is 
given  by  sin  (3  =  cr/qo,  where  a  =  V0m  is  the  wave  speed  of  the  shock  wave  with 
respect  to  the  fluid  ahead  and  m  is  the  mass  flux  across  the  shock,  m2  =  — A-P/AIh 
The  difference  between  the  flow  direction  on  either  side  of  the  shock  is  called  the 
turning  angle  of  the  wave. 

The  same  analysis  when  applied  to  the  simple  wave  curves  shows  that  the  entropy 
is  constant  inside  a  Prandtl-Meyer  wave.  The  flow  speed  and  flow  direction  are 
related  by  (5.2)  where  H  =  H(P,  So)  and 

(5.4)  0  =  0o:f  /^dP  — 

JPo  PC<1  s 

In  analogy  to  the  shock  polar  defined  by  (5.1)-(5.3)  we  will  call  this  locus  of  states 
the  rarefaction  polar. 

It  is  easily  checked  that  the  two  branchs  of  (5.4)  are  respectively  associated  with 
the  9±A  characteristic  directions  in  the  sense  of  Lax  [20].  Similarly  it  can  be  shown 
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[15]  that  for  most  equations  of  state,  the  two  branches  of  (5.3)  are  also  associated 
with  the  9±A  characteristics  in  the  sense  of  Lax  provided  the  state  downstream  from 
the  shock  is  supersonic.  Since  6  and  P  are  constant  across  waves  of  the  degenerate 
middle  family,  the  Riemann  problem  for  a  stationary  two-dimensional  flow  can  be 
solved  by  finding  the  intersection  of  the  projections  of  the  wave  curves  in  the  6  —  P 
phase  plane. 

The  axe  two  major  differences  between  the  solution  to  the  Riemann  problem  for 
a  stationary  flow  and  that  of  a  one-dimensional  unsteady  flow.  The  Mach  number 
behind  the  shock  wave  is  given  by 

2  !/2 

(5.5)  M  =  —  (1  +  cot2  /?) 

pc  Pi 

For  most  equations  of  state  [22]  m  <  pc  and  is  a  monotone  function  of  the  pressure 
along  the  shock  Hugoniot.  Thus  if  /?  is  sufficiently  close  to  f  the  flow  behind  the 
shock  will  be  subsonic  and  the  steady  Euler  equations  ceases  to  be  hyperbolic.  The 
second  reason  is  that  for  an  normal  angle  of  incidence,  the  turning  angle  through 
the  shock  is  zero.  This  means  that  the  two  branches  of  the  shock  polar  meet  at 
this  point  forming  a  closed  and  bounded  loop.  These  two  issues  together  imply  a 
loss  of  existence  and  uniqueness  for  the  solution  to  the  two  dimensional  stationary 
Riemann  problem.  This  means  that  that  a  bifurcation  must  occur  from  a  stationary 
solution  to  a  time  dependent  solution  of  the  full  two  dimensional  Euler  equations. 

The  actual  shape  and  properties  of  the  shock  and  rarefaction  polars  depends 
on  the  equation  of  state.  We  will  make  no  use  of  a  specific  choice  of  equation  of 
state  in  our  analysis,  but  we  will  need  to  assume  that  the  equation  of  state  satisfies 
appropriate  conditions  to  guarantee  that  the  shock  polar  has  a  unique  point  at 
which  the  state  behind  the  shock  becomes  sonic,  and  a  unique  local  extremum  in 
the  turning  angle.  These  conditions  are  satisfied  by  most  ordinary  equations  of 
state,  and  in  particular  by  the  polytropic  and  stiffened  polytropic  equations  of  state 
used  in  the  numerical  examples. 

6.  Anomalous  Reflection. 

As  was  mentioned  above,  the  simpliest  case  of  shock  diffraction  is  that  in  which 
the  flow  near  a  point  of  diffraction  is  scale  invariant  and  pseudo-stationary.  This 
will  be  the  case  provided  the  flow  is  sufficiently  supe;.^nic  when  measured  in  a 
frame  that  moves  with  the  point  [15].  Then  the  data  behind  the  incoming  waves 
define  Riemann  data  for  the  downstream  scattering  of  the  interacting  waves.  A  rep¬ 
resentative  shock  polar  diagram  for  a  regular  shock  diffraction  producing  a  reflected 
Prandtl-Meyer  wave  is  shown  in  Fig.  10. 

Diffractions  of  these  types  have  been  studied  experimentally  by  several  investiga¬ 
tors  [1],  [  2],  [  17],  [  18],  as  well  as  numerically  [5],  [  15].  Longer  time  simulations 
of  the  resulting  surface  instabilities  in  the  fluid  interface  (called  the  Richtmyer- 
Meshkov  instability  [23],  [  28])  are  found  in  [15],  [  24],  [  34].  One  of  the  interfero- 
grames.  Fig.  14  of  [18]  shows  an  irregular  wave  pattern  that  corresponds  to  what  we 
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Fig.  10  A  sketch  of  the  wave  pattern  and  polar  diagrams  for  a  reg¬ 
ular  shock-contact  diffraction  that  produces  a  reflected  rarefaction 
wave. 


call  an  anomalous  reflection.  In  this  wave  the  angle  between  the  incident  shock  and 
the  material  interface  is  such  that  the  state  behind  the  shock  has  become  subsonic. 

We  consider  the  perturbation  of  a  regular  shock  diffraction  that  produces  a  re¬ 
flected  Prandtl-Meyer  wave.  Suppose  that  initially  the  state  behind  the  incident 
shock  is  close  to  but  slightly  below  the  sonic  point  on  the  incident  shock  polar. 
We  allow  the  incident  angle  to  increase  while  keeping  the  other  variables  constant 
so  that  the  state  behind  the  incident  shock  passes  above  the  sonic  point.  Such  a 
situation  might  occur  as  a  shock  diffracts  through  a  bubble  as  illustrated  in  Fig.  11. 
When  this  happens,  the  solution  can  no  longer  be  self-similar  since  a  Prandtl-Meyer 
wave  can  only  o^  supersonic  flow.  Instead  the  reflected  wave  begins  to  over¬ 

take  and  interact  with  the  incident' shock,  Fig.  11c.  This  interaction  dampens  and 
curves  the  incident  shock  near  its  base  on  the  fluid  interface  allowing  the  flow  im¬ 
mediately  behind  the  node  to  ret’^n  to  a  supersonic  condition.  The  single  point  of 
interaction  bifurcates  into  a  degenerate  overtake  node  where  the  leading  edge  of  the 
reflected  rarefaction  overtakes  the  incident  shock,  and  a  sonic  diffraction  node  at  the 
fluid  interface.  This  interaction  is  a  two-dimensional  version  of  the  one-dimensional 
overtaking  of  a  shock  by  a  rarefaction.  The  composite  configuration  is  in  many 
ways  analogous  to  a  regular  Mach  reflection.  In  this  case  the  reflected  wave  is  a 
Prandlt-Meyer  wave  and  instead  of  a  single  point  of  Mach  reflection  the  interaction 
is  spread  over  the  region  where  the  rarefaction  interacts  with  the  incident  shock. 
The  “Mach”  stem  can  be  regarded  as  the  entire  region  from  the  point  where  the 
incident  shock  is  overtaken  by  the  rarefaction  to  its  base  on  the  fluid  interface. 
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(a)  time  0.0  nsec  (b)  time  0. 15  jisec 


10  Ax  *  10  Ay 

Fig.  11  The  collision  of  a  shock  wave  in  water  with  an  air  bubble. 

The  fluids  ahead  of  the  shock  are  at  normal  conditions  of  1  atm. 
pressure,  with  the  density  of  water  1  g/cc  and  air  0.0012  g/cc.  The 
pressure  behind  the  incident  shock  is  10  Kbar  with  a  shocked  water 
density  of  1.195  g/cc.  The  grid  is  60  x  60. 

If  we  allow  the  incident  angle  to  increase  further  we  will  eventually  see  a  second 
bifurcation  in  the  solution,  Fig.  lid.  As  the  material  interface  continues  to  diverge 
from  the  incident  shock,  the  Mach  number  near  the  trailing  edge  of  the  reflected 
rarefaction  continues  to  decrease.  The  characteristics  behind  the  incident  shock  are 
almost  parallel  to  the  shock  interface  near  the  base  of  the  anomalous  reflection. 
The  flow  there  becomes  nearly  one-dimension  and  the  rarefaction  wave  eventually 
overtakes  the  incident  shock.  If  there  is  a  great  difference  in  the  acoustic  impedance 
between  the  two  materials  as  in  the  numerical  cases  studied  here,  this  second  bifur¬ 
cation  will  occur  as  the  strength  of  the  incident  shock  at  the  fluid  interface  reduces 
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to  zero.  The  now  non-centered  rarefaction  breaks  loose  from  the  fluid  interface  and 
begins  to  propagate  away.  This  second  configuration  is  also  analogous  to  a  Mach 
reflection.  Here  the  Mach  node  corresponds  to  the  interaction  region  between  the 
rarefaction  and  incident  shock,  while  the  Mach  stem  is  the  degenerate  wave  portion 
from  the  trailing  edge  of  the  rarefaction  to  the  fluid  interface. 

7.  The  Tracking  of  the  Anomalous  Reflection  Wave. 

The  qualitative  discussion  of  the  anomalous  reflection  in  the  previous  section  can 
be  incorporated  into  a  front  tracking  code  to  give  an  enhanced  resolution  of  the 
interaction. 

The  tracking  of  a  regular  shock  diffraction  was  described  in  [15].  The  first  step 
in  the  propagation  is  the  computation  of  the  velocity  of  the  diffraction  node  with 
respect  to  the  computational  (lab)  reference  frame.  Suppose  at  time  t  the  node  is 
located  at  point  p00.  The  node  position  at  time  t  +  dt  is  found  by  computing  the 
intersection  between  the  two  propagated  segments  of  the  incident  waves.  If  this  new 
node  position  is  po,  then  the  node  velocity  is  given  by  (po  —  Poo)/dt.  This  velocity 
defines  the  Galilean  transformation  into  a  frame  where  the  node  is  at  rest.  When 
the  state  behind  the  incident  shock  is  supersonic  in  this  frame,  it  together  with 
the  state  on  the  opposite  side  of  the  fluid  interface  provide  data  for  a  supersonic 
steady  state  Riemann  problem  whose  solution  determines  the  outgoing  waves.  The 
outgoing  tracked  waves  are  then  modified  to  incorporate  this  solution. 

A  bifurcation  will  occur  if  the  calculated  node  velocity  is  such  that  the  state 
behind  the  incident  shock  is  subsonic  in  the  frame  of  the  node.  If  the  reflected 
wave  is  a  Prandtl-Meyer  wave  this  will  result  in  an  anomalous  reflection.  The  front 
tracking  implementation  of  this  bifurcation  is  a  straightforward  application  of  the 
analysis  described  in  the  previous  section. 

First  the  leading  edge  of  the  reflected  rarefaction  is  allowed  to  break  loose  from 
the  diffraction  node.  The  intersection  pi  between  the  propagated  rarefaction  leading 
edge  and  the  incident  shock  is  computed  and  a  new  overtake  node  is  installed  at 
Pi  by  disconnecting  the  rarefaction  leading  edge  from  the  diffraction  node  and 
connecting  it  to  pi. 

If  this  reflected  rarefaction  edge  is  untracked,  then  p\  is  found  by  calculating 
the  characteristic  through  the  old  node  position  corresponding  to  the  state  behind 
the  incident  shock  and  computing  the  intersection  of  its  propagated  position  with 
the  propagated  incident  shock.  This  characteristic  makes  the  Mach  angle  A  with 
the  streamline  through  the  node.  Since  the  bifurcation  occurs  between  times  t  and 
t  +  dt ,  M  >  1  at  time  t  and  A  is  real.  This  wave  moves  with  sound  speed  in  its 
normal  direction.  In  this  case  no  new  overtake  node  is  tracked. 

We  are  now  ready  to  compute  the  states  and  position  of  the  point  of  shock 
diffraction  after  the  bifurcation.  As  was  mention  previously,  the  rarefaction  expands 
onto  the  incident  shock  causing  it  weaken.  This  in  turn  slows  down  the  node  causing 
the  incident  shock  to  curve  into  the  fluid  interface.  The  diffraction  node  will  slow 
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Fig.  12  A  diffraction  node  initially  at  poo  bifurcates  into  an  anoma¬ 
lous  reflection.  The  predicted  new  node  position  at  p0  yields  a 
Mach  number  of  0.984  behind  the  incident  shock.  The  leading 
edge  of  the  reflected  Prandtl-Meyer  wave  breaks  away  from  the 
diffraction  node  to  form  an  overtake  node  at  p\ .  The  propagated 
position  of  the  diffraction  node  is  adjusted  to  return  the  flow  to 
sonic  behind  the  node. 

down  to  the  point  where  the  state  immediately  behind  the  node  becomes  sonic. 
After  this  the  configuration  near  the  node  can  be  computed  using  the  regular  case 
analysis. 

The  adjusted  propagated  node  position  is  computed  as  follows,  see  Fig.  12.  For 
each  number  s  sufficiently  small,  let  p(s)  be  the  point  on  the  propagated  material 
interface  that  is  located  a  distance  s  from  po  when  measured  along  the  curve,  the 
positive  direction  being  oriented  away  from  the  node  into  the  region  ahead  of  the 
incident  shock.  Let  l3(s )  be  the  angle  between  the  tangent  vector  to  the  material 
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interface  at  p(s )  and  the  directed  line  segment  between  the  points  p(s)  and  p\ .  Let 
v(s)  be  the  node  velocity  found  by  moving  the  diffraction  node  to  position  p(s), 
and  let  q(s)  be  the  velocity  of  the  flow  ahead  of  the  incident  shock  in  the  frame 
that  moves  with  velocity  v(s)  with  respect  to  the  lab  frame.  The  mass  flux  across 
this  shock  is  given  by 

(7.1)  m(s)  =  p0\q(s)\ sin£(s)  . 

Given  m(s)  and  the  state  ahead  of  the  incident  shock,  the  state  behind  the  shock 
and  hence  its  Mach  number  M(s)  can  be  found.  The  new  node  position  is  given  by 
p(s*),  where  s*  is  the  root  of  the  equation  M(s*)  =  1.  Finally,  the  state  behind  the 
incident  shock  with  mass  flux  m(s*)  together  with  the  state  on  the  opposite  side 
of  the  contact  are  used  as  data  for  a  steady  state  Riemann  problem  whose  solution 
supplies  the  states  and  angles  of  the  transmitted  shock,  the  trailing  edge  of  the 
reflected  rarefaction,  and  the  downstream  material  interface. 

The  subsequent  propagation  of  the  anomalous  reflection  node  is  performed  in 
the  same  way.  The  bifurcation  repeats  itself  as  more  of  the  reflected  rarefaction 
propagates  up  the  incident  shock.  The  leading  edge  of  the  reflected  rarefaction 
wave  that  connects  to  the  diffraction  node  is  not  tracked  after  the  first  bifurcation. 

The  secondary  bifurcations  that  occur  when  the  trailing  edge  of  the  rarefaction 
overtakes  the  incident  shock  are  detected  in  a  couple  of  ways.  If  the  incident  shock 
is  sufficiently  weak,  i.e.,  the  normal  shock  Mach  number  is  close  to  1,  then  it  is 
possible  for  the  numerically  calculated  upstream  Mach  number  to  be  less  than 
one.  This  is  a  purely  numerical  effect  since  physically  the  upstream  state  is  always 
supersonic.  However  in  nearly  sonic  cases  such  numerical  undershoot  can  occur. 
If  such  a  situation  is  detected  the  trailing  edge  of  the  reflected  rarefaction  wave 
is  disengaged  from  the  anomalous  reflection  node  and  installed  at  a  new  overtake 
node  on  the  incident  shock.  The  residual  shock  strength  for  the  portion  of  the 
incident  shock  behind  the  rarefaction  wave  is  small  and  the  diffraction  node  at  the 
material  interface  reduces  to  the  degenerate  case  of  a  sonic  signal  diffracting  through 
a  material  interface. 

The  second  way  in  which  the  secondary  bifurcation  is  detected  occurs  when  the 
trailing  edge  of  the  rarefaction  overtakes  the  shock.  Here  a  new  intersection  between 
the  incident  shock  and  the  trailing  edge  characteristic  is  produced.  As  before  the 
tracked  characteristic  is  disengaged  from  the  diffraction  node  and  a  new  overtake 
node  is  installed  at  the  point  of  intersection.  The  residual  shock  strength  at  the 
node  is  non-zero  so  the  diffraction  at  the  material  interface  produces  an  additional 
expansion  wave  behind  the  original  one.  This  new  expansion  wave  is  not  tracked. 

It  is  possible  to  make  a  few  remarks  about  the  amount  of  tracking  required  for 
these  problems.  Since  the  front  tracking  method  is  coupled  to  a  finite  difference 
method  for  the  solution  away  from  the  tracked  interface  (the  interior  solver),  there 
is  always  an  option  between  tracking  a  wave  or  allowing  it  to  be  captured.  Of  course 
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capturing  can  result  in  a  considerable  loss  in  resolution  in  the  waves  as  compared 
to  tracking  [9],  but  it  will  also  simplify  the  resolution  of  the  interactions.  The 
secondary  bifurcations  described  above  are  only  tracked  when  the  trailing  edge  of 
the  reflected  Prandtl-Meyer  wave  is  tracked.  The  current  algorithm  is  structured  so 
that  at  a  minimum  the  two  interacting  incoming  waves  are  tracked.  At  this  extreme 
none  of  the  outgoing  waves  are  tracked  and  no  explicit  bifurcations  in  the  tracked 
interface  occur.  More  commonly,  the  material  interface  separates  different  fluids 
and  so  must  be  tracked  on  both  sides  of  the  interaction.  Also,  instabilities  in  the 
finite  difference  approximation  can  affect  the  accuracy  of  the  solution  near  the  node, 
especially  for  stiff  materials  such  as  water.  Tracking  the  additional  waves  seems  to 
considerably  reduce  these  problems.  Tracking  also  allows  the  use  of  a  much  coarser 
grid,  which  is  important  when  the  diffraction  occurs  in  a  small  but  important  zone 
of  a  larger  simulation.  It  allows  the  entire  region  of  diffraction  to  extend  only  over 
only  a  fraction  of  a  grid  block.  These  remarks  show  that  the  amount  of  tracking  is 
problem  dependent,  and  a  compromise  can  be  made  between  the  increased  accuracy 
and  stability  of  front  tracking,  and  the  simplicity  of  a  capturing  algorithm. 

8.  Numerical  Examples. 

Fig.  13  shows  a  series  of  frames  documenting  the  collision  of  a  10  Kbar  shock  wave 
with  a  bubble  of  air  in  water.  Note  in  this  case  the  trailing  edge  of  the  reflected 
Prandtl-Meyer  wave  is  not  tracked.  The  states  ahead  of  the  incident  shock  are  at 
one  atmosphere  pressure  and  standard  temperature.  Under  these  conditions,  water 
is  about  a  thousand  times  as  dense  as  air.  During  the  initial  stage  of  the  interaction 
regular  diffraction  patterns  are  produced. 

In  less  than  half  of  a  microsecond  an  anomalous  reflection  has  formed,  and  by 
one  microsecond  the  trailing  edge  of  the  rarefaction  has  also  overtaken  the  incident 
shock.  It  is  interesting  to  note  that  this  interaction  causes  the  bubble  to  collapse 
into  itself.  Long  time  simulations  are  expected  to  show  the  initial  bubble  split, 
and  the  resulting  bubbles  going  into  oscillation  as  they  are  overcompressed  and 
then  expand.  This  process  is  important  in  the  transfer  of  energy  as  a  shock  passes 
through  a  bubbly  fluid.  The  first  diffraction  considerably  dampens  the  shock,  and 
much  of  this  energy  will  eventually  be  returned  to  the  shock  wave  in  the  form  of 
compression  waves  generated  by  the  expanding  bubbles. 

Fig.  14  shows  the  diffraction  of  an  expanding  underwater  shock  wave  through 
the  water’s  surface.  Initially  a  ten  Kbar  cylindrically  expanding  shock  wave  with  a 
radius  of  one  meter  is  placed  two  meters  below  the  water’s  surface.  The  interior  of 
the  shock  wave  contains  a  bubble  of  hot  dense  gas.  The  states  exterior  to  the  shock 
are  ambient  at  one  atmosphere  pressure  and  normal  temperature.  A  gravitational 
acceleration  of  one  g  has  been  added  in  this  case,  but  due  to  the  rapid  rime  scale 
on  which  the  diffractions  occur  the  effect  of  gravity  is  negligible.  Here  the  entire 
reflected  Prandtl-Meyer  wave  is  captured  rather  than  tracked.  The  pressure  con¬ 
tour  plots  show  that  by  six  milliseconds  an  anomalous  reflection  has  developed  as 
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(a)  time  0.0 


(b)  time  0.15  jisec 


10  Ax  =  10  Ay 

Fig.  13  Log(l  +  pressure)  contours  for  the  collision  of  a  shock  wave 
in  water  with  an  air  bubble.  The  fluids  ahead  of  the  shock  are  at 
normal  conditions  of  1  atm.  pressure,  with  the  density  of  water  1 
g/cc  and  air  0.0012  g/cc.  The  pressure  behind  the  incident  shock 
is  10  Kbar  with  a  shocked  water  density  of  1.195  g/cc.  The  tracked 
interface  is  shown  in  a  dark  line.  The  grid  is  60  x  60. 


indicated  in  the  blowup  of  Fig.  14b  shown  in  Fig.  15.  Another  interesting  feature  of 
this  problem  is  the  acceleration  of  the  bubble  inside  the  shock  wave  by  the  reflected 
rarefaction  wave.  This  causes  the  bubble  to  rise  much  faster  than  it  would  under 
just  gravity.  When  the  bubble  reaches  the  surface  it  expands  into  the  atmosphere 
leading  to  the  formatioc  of  a  kink  in  the  transmitted  shock  wave  between  the  region 
ahead  of  the  surfacing  bubble,  and  the  rest  of  the  wave.  This  kink  is  an  untracked 
example  of  the  elementary  wave  called  the  cross  node  where  two  oblique  shocks 
collide. 
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(a)  time  0.0  msec 


(b)  time  6.0  msec 


25  Ax  =  25  Ay 

Fig.  14  An  underwater  expanding  shock  wave  diffracting  through 
the  water’s  surface.  An  expanding  shock  wave  with  an  internal 
pressure  of  10  Kbars  and  initial  radius  of  1  meter  is  installed  at 
a  depth  of  2  meters  below  the  water's  surface.  The  external  con¬ 
ditions  are  ambient  at  one  atmosphere  pressure  and  normal  densi¬ 
ties  for  the  air  and  water.  The  boundary  conditions  axe  constant 
Dirichlet  at  the  initial  ambient  values.  The  grid  is  150  x  150. 

The  water  in  the  simulations  described  above  is  modeled  by  the  stiffened  polv- 
tropic  equation  of  state  with  T0  =  6,  £<»  =  0,  and  =  3000  atm.  The  air  is 
treated  as  a  polytropic  gas  with  7  =  1.4. 

9.  Summary. 

We  have  seen  ''hat  the  process  of  bubble  growth  and  interaction  in  gravity  driven 
mixing  can  be  modeled  using  a  simplified  description  of  the  bubble  dynamics  at 
least  in  the  small  compressibility  regime.  In  this  regime,  the  model  agrees  with 
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25  Ax  =  25  Ay 

Fig.  15  A  blowup  of  Fig.  14.1b  showing  pressure  contours  scaled 
from  0.001  -  10  Kbars.  The  tracked  interface  is  shown  superim¬ 
posed  in  a  dark  line  over  the  pressure  contours. 

experiments  and  computer  simulations  well  into  the  chaotic  regime.  It  should  be 
possible  to  include  results  of  this  simplified  model  in  statistical  models  that  can 
study  the  interaction  of  large  numbers  of  bodies. 

We  also  studied  the  diffraction  of  a  shock  through  a  material  interface  from  a 
medium  of  high  to  low  acoustic  impedance.  The  bifurcations  that  occur  during 
the  diffraction  were  analyzed  in  terms  of  polar  diagrams  for  steady  supersonic  flow. 
This  analysis  was  incorporated  into  a  front  tracking  code  to  allow  enhanced  reso¬ 
lution  computations  of  the  interactions.  The  particular  simulations  studied  were 
the  diffraction  of  a  planar  shock  in  water  through  an  air  bubble,  and  the  diffrac¬ 
tion  of  an  expanding  shock  in  water  through  the  water’s  surface.  In  both  cases  the 
anomalous  reflection  bifurcation  plays  an  important  role  in  correctly  computing  the 
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flow. 
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Abstract 

The  characteristics  of  a  single  streamwise  vortex  embedded  in  Poiseuille  flow  have  been 
analysed  both  numerically  and  analytically.  On  short  time  scales,  velocity  profiles  are  found  to 
evolve  similarly  to  those  previously  derived  in  unbounded  Couette  flow.  As  wall  effects  begin  to 
be  felt  at  later  times,  a  perturbation  solution  is  derived  whose  profiles  and  decay  rates  are  found 
to  agree  with  calculations.  Both  solutions  are  used  to  answer  questions  about  the  strength  and 
motion  of  the  vortex,  the  presence  and  strength  of  any  counter-rotating  vortices  induced  at  the 
viscous  wall,  and  the  possible  (or  impossible)  role  of  such  induced  vortices  in  transition.  Time 
scale  arguments  are  used  to  make  inferences  about  the  stability  of  various  initial  distributions 
of  streamwise  vorticity,  and  in  particular  to  derive  a  neutral  curve  (critical  Reynolds  number  vs 
vertical  position  in  flow)  with  a  minimum  critical  Reynolds  number  close  to  1000. 
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1  INTRODUCTION 


Laminar-turbulent  transition  is  still  an  incompletely  understood  phenomenon.  Observations  disprove 
the  conjecture  that  transition  is-  governed  by  the  Reynolds  number,  Ft.  Linear  stability  theory 
correctly  explains  phenomena  within  its  regime  (eg,  instability  of  plane  parabolic  flow  to  infinitesimal 
perturbations  above  a  critical  Reynolds  number  Re  =  5772  based  on  surface  velocity  and  flow  depth), 
but  it  turns  out  that  Tollmein-Schlicting  waves  are  neither  necessary  nor  sufficient  for  transition;  not 
necessary  since  subcritical  transition  is  observed  at  Rdown  to  about  1000,  and  not  sufficient  in  the 
sense  that,  when  present,  their  most  unstable  mode  grows  on  a  diffusive  time  scale,  and  in  the  absence 
of  a  secondary  three-dimensional  perturbation,  these  modes  do  not  become  significant  until  very 
large  times  (eg,  Nishiokaet  al,  1975).  2-D  finite  amplitude  perturbations  predict  Rc  ss  2800  in  plane 
parabolic  flow,  but  fail  to  account  for  observed  transition  down  to  Rc  ss  1000.  and  produce  profiles 
that  always  saturate  in  amplitude  before  decaying  back  to  laminar  (Herbert.  1977).  Additionally, 
some  other  flows  (eg,  plane  Couette)  exhibit  no  2-D  finite  amplitude  growing  modes.  2-D  finite 
amplitude  waves  have  been  found  to  have  a  strong  secondary  instability  in  the  presence  of  a  small 
3-D  disturbance  at  R  >  1200  in  rough  agreement  with  observations  (Orszag  and  Patera.  19831. 
but  there  is  no  rational  mechanism  to  generate  these  finite  amplitude  waves  in  a  subcritical  flow 
Statistical  methods  have  the  problem  of  closure  from  a  theoretical  standpoint;  empirically,  they 
perform  well  in  known  regimes,  but  extrapolate  poorly  to  new  ones,  and  tell  us  little  of  the  physics 
of  transition.  At  a  very  practical  level,  an  understanding  of  the  physical  processes  involved  is 
desirable  for  use  when,  for  example,  designing  for  reduced  drag  or  designing  for  increased  mixing. 

Much  attention  has  been  devoted  recently  to  investigating  the  role  of  streamwise  vorticity  in 
shear  flow  transition.  The  experiments  of  Klebanoff  et  al  (1962)  showed  that  2-D  waves  developed  a 
spanwise  waviness  that  generated  3-D  streamwise  vortices,  which  in  turn  broke  down  as  part  of  the 
transition  cycle.  Taylor  vortices,  Gortler  vortices,  the  trailing  vortices  from  a  boundary  roughness 
element,  sweeps  from  the  outer  flow  that  pull  up  spanwise  vortex  tubes  which  are  subsequently 
stretched  by  the  shear  to  form  hairpin  vortices  -  all  these  are  observed  to  be  possible  precursors  to 
transition.  The  common  element  in  all  is  the  presence  of  the  streamwise  vortex  in  shear  flow  (eg, 
Figure  1). 

As  the  simplest  abstraction,  Pearson  and  Abernathy  (1984)  investigated  a  single  infinitely  long 
vortex  aligned  in  the  flow  direction  in  an  unbounded  uniform  shear  flow  ( Figure  2)  They  found  that 
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the  full  3-0  incompressible  Navier-Stokes  equations  reduced  to  one  ODE  in  a  similarity  variable. 
This  solution  depends  on  the  vortex  Reynolds  number  (R*  =  •  a  measure  of  the  vortex  strength; 

specifically,  the  ratio  of  the  circulation  rate  to  diffusion  rate)  and  not  on  the  flow  Reynolds  number, 
R.  The  flow  Reynolds  number  sets  the  local  shear,  which  in  turn  sets  the  time  scale  of  the  instability. 
Figure  3  shows  the  perturbed  streamwise  velocity  profiles  as  caused  by  vortices  of  different  strengths. 
The  effect  of  the  vortex  is  to  rotate  low  velocity  fluid  from  bottom  to  top  and  high  velocity  fluid 
from  top  to  bottom.  The  resulting  inflectional  profiles  are  reminiscent  of  those  treated  in  inviscid 
stability  theory.  A  linear  stability  analysis  of  these  profiles  in  a  viscous  fluid  shows  them  to  be 
unstable  for  R„  >2  —  3  (and  for  any  R).  Yang  and  Abernathy  (1987)  generated  what  they  believed 
to  be  a  single  vortex  in  plane  parabolic  flow  and  produced  similar  velocity  profiles,  but  observed 
additional  structure  near  the  bottom  viscous  wall.  Suri  and  Abernathy  (1988)  investigated  the  effect 
of  a  diffuse  array  of  vortices  above  a  viscous  wall,  but  found  profiles  that  took  much  longer  to  develop 
and  were  less  inflectional. 

These  results  raise  additional  questions.  Do  Pearson’s  inflectional  profiles  generalize  to  more 
realistic  (eg,  bounded,  non-Oseen  vortex)  flows?  Or  are  these  profiles  just  peculiar  artifacts  of 
starting  with  a  delta  function  of  vorticity,  with  more  realistic  initial  conditions  yielding  much  less 
perturbation  of  the  streamwise  flow  field?  Even  if  more  realistic  initial  conditions  yield  similar 
profiles,  does  the  viscous  wall  reduce  circulation  enough  to  make  the  vortices  (and  thus  their  reduced 
inflectional  profiles)  just  innocuous  observers  during  the  transition  sequence?  How  do  Suri’s  velocity 
profiles  tie  in  with  those  of  Pearson?  Does  a  single  streamwise  vortex  kick  up  enough  added  structure 
at  the  viscous  wail  so  as  to  not  remain  a  single  vortex,  and  thereby  cast  doubt  on  Yang’s  conclusions 
about  the  role  of  the  single  vortex  in  transition?  If  R  is  not  the  controlling  factor  in  transition,  what 
is  its  role?  This  paper  attempts  to  answer  these  questions. 


Figure  1:  Boundary  layer  transition  on  a  flat  plate  (Werle  1980).  Note  the  initially  counter-rotating 
vortex  pair  at  left,  as  evidenced  by  the  contracting  flow  near  the  plate  and  the  upflow  region  at  the 
center. 
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Figure  2:  Pearson  and  Abernathy’s  (1984)  undisturbed  flow  configuration.  Flow  is  an  unbounded 
uniform  shear.  A  line  vortex  of  strength  f2»  is  inserted,  coincident  with  the  x  axis,  at  time  zero. 


Figure  3:  Dimensionless  vertical  profiles  of  the  streamwise  velocity  after  being  perturbed  by  stream- 
wise  vortices  of  various  strengths  /?„.  Profiles  are  calculated  after  a  constant  time  interval.  From 
Pearson  and  Abernathy  (1984). 
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2  PROBLEM  FORMULATION 

2.1  GEOMETRY 

We  begin  by  adding  another  layer  of  realism  to  Pearson:  we  consider  a  bounded  region  with  a  viscous 
wall,  a  parabolic  mean  profile,  and  initial  distributions  of  vorticity  other  than  potential.  Specifically, 
we  consider  Poiseuille  flow  of  infinite  extent  in  the  streamwise  (z)  and  spanwise  (z)  directions  over 
a  viscous  plane.  A  free  surface  is  located  at  height  h  with  streamwise  velocity  u<).  The  wall  (and 
z  axis)  are  at  an  angle  9  from  horizontal,  and  it  is  the  body  force  due  to  the  component  of  gravity 
in  the  z  direction  that  generates  the  mean  velocity  profile.  Embedded  in  the  mean  flow  is  a  periodic 
array  of  vortex  pairs  aligned  in  the  streamwise  direction,  each  of  initial  circulation  To-  The  ratio  of 
the  average  vortex  spacing  to  flow  depth  is  the  aspect  ratio,  a.  (See  Figure  4).  The  case  of  a  single 
streamwise  vortex  in  an  infinitely  wide  box  is  obtained  in  the  limit  a  — *  oo.  Since  we  will  often  be 
interested  in  this  limit,  we  plot  the  independent  variable  £  rather  than  a  in  order  to  put  this  limit 
at  the  origin.  Vertical  streamlines  dividing  a  vortex  from  its  images  on  the  left  and  right  are  located 
at  z  —  ±^-.  These  streamlines  can  equivalently  be  thought  of  as  slippery  walls  -  slippery  in  the  sense 
that  boundary  conditions  (Section  2.4  below)  require  no  shear  stress  on  them.  For  later  reference, 
we  denote  the  left  wall  as  du/t,  the  free  surface  as  dt0p,  the  right  wall  as  dright,  and  the  bottom  wall 
as  dfa>ttom  • 


2.2  ASSUMPTIONS 

1.  Like  Pearson,  Yang,  and  Suri,  we  assume  no  z  dependence  in  the  flow  at  t  =  0.  This  is  based 
on  the  observation  that  the  flow  variables  change  slowly  in  the  z  direction  compared  to  changes 
in  the  yand  z  directions  (eg,  Figure  l). 

2.  We  assume  that  the  free  surface  is  constrained  by  a  normal  force  to  a  horizontal  plane.  This 
assumption  allows  us  to  eliminate  one  variable  (free  surface  position)  from  the  equations,  but 
also  means  that  additional  circulation  is  lost  through  the  constrained  free  surface  (which  is 
now  essentially  another  slippery  boundary  caused  by  an  image  vortex).  This  assumption  seems 
warranted  based  on  experimental  observations:  in  Appendix  B,  we  demonstrate  its  consistency. 

3.  At  t  =  0  we  are  capable  of  using  any  initial  distribution  of  vorticity.  but  for  simplicity  of 
analysis  we  choose  to  use  either  a  delta  function  of  vorticity  (like  Pearson  and  \ang).  a  fully 
diffused  (in  a  sense  to  be  explained  later)  vortex  (like  Suri),  or  a  cylindrical  tophat  function 
of  vorticity  as  a  new  intermediate  case. 


2.3  EQUATIONS 


We  begin  with  the  3-D  incompressible  Navier-Stokes  equations.  We  nondimensionalize  with  re¬ 
spect  to  the  characteristic  variables  listed  in  Table  1.  We  choose  these  characteristic  quantities  on 
the  purely  pragmatic  grounds  that  they  result  in  the  greatest  simplification  in  the  appearance  of 
subsequent  formulas.  All  variables  throughout  this  paper  are  dimensionless,  except  where  noted. 

The  lack  of  z  dependence  at  t  =  0  implies  that  there  will  be  no  z  dependence  at  any  later 
time.  This  eliminates  the  streamwise  velocity  u  from  the  crossflow  equations,  resulting  in  a  partial 
decoupling  first  noticed  by  Mitchner  (1952)  and  by  Stuart  (1965).  This  allows  a  stream  function- 
vorticity  formulation  in  the  crossflow  directions.  The  equations  become: 


Crossflow: 


+  vu.j  +  wwt  =  —  (wys,  +  ) 


(1) 
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Crossflow 

(Ry^Q) 

Streamwise 
(Ry  *  0) 

Streamwise 
(Ry  =  0) 

Time 

zk 

7k 

Ai 

yR 

Length 

h 

h 

h 

Velocity 

Zfuo 

uo 

Uo 

Acceleration 

v1rRu 

Stream  Function 

l/Ry 

- 

- 

Potential 

kUoh 

- 

- 

Vorticity 

vRu 

h* 

Circulation 

E a 

2» 

uo  h 

u0/> 

Pressure/Stress 

wRu* 

h* 

yvRu* 

h* 

uyR3 
h J 

Nondimensional  parameters  are: 

Ry  =  (ratio  of  spanwise  convective  effects  to  diffusive  effects) 

R  =  (ratio  of  streamwi9e  convective  effects  to  diffusive  effects) 

Table  1:  Characteristic  variables  and  nondimensional  parameters 
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(2) 


Streamwise: 


lf>yy  +  rpMt  =  -W 

W  —  4>v 

V  =  -if), 

1 


Ut  +  VUy  +  WU,  =  —  +  -jr-Ctiyy  + 
JVu  IXu 


(3) 


The  body  force  term  in  the  streamwise  direction  is  the  component  of  gravity  in  the  x  direction 
that  establishes  and  works  to  maintain  the  original  parabolic  profile. 


2.4  BOUNDARY  CONDITIONS 

On  dj  ,ft,  dt0p,  and  driyht  (slippery  walls):  <7  =  0.  On  bottom  (viscous  wall):  u  =  v  =  w  =  0.  These 
yield  the  boundary  conditions: 


Equation  1: 

u  =  0 

On  dtgft i  dtopi  bright 

w  =  0 

On  dtottom 

Equation  2: 

o 

11 

On  dfoft,  dtopi  drighti  dtottom 

Equation  3: 

Uy  =  0 

On  diop 

u,  =  0 

On  dleft,  dright 

u  =  0 

On  dtottom 

2.5  INITIAL  CONDITIONS 

We  initialize  in  both  the  streamwise  and  crossflow  directions: 

Equation  1:  u  ~  wo(z,y)  (for  any  specified  distribution  u0) 
Equation  3:  u  =  2y  —  y2 
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Geometry: 


X 


A  . 

y  (  +  »«««•«*»*•) 


Figure  4:  Geometry  adopted  for  this  investigation.  We  consider  an  initially  parabolic  How  of  infinite 
extent  in  rand  r above  a  plane  viscous  wall.  At  time  zero,  an  array  of  counter-rotating  vortex  pairs 
is  begun,  aligned  in  the  x  direction.  Similar  to  previous  flow  table  experiments,  except  the  tree 
surface  ia  constrained  to  a  flat  plane. 


378 


3  COMPUTATIONAL  SOLUTION 

At  this  point,  there  is  no  obvious  way  to  attack  the  problem  analytically.  Unlike  Pearson’s  exploita¬ 
tion  of  the  Oseen  vortex,  we  have  no  expression  for  the  evolving  bounded  vortex,  and  because  of 
the  boundaries  there  does  not  seem  to  be  much  likelihood  of  finding  a  similarity  solution  for  the 
streamwise  velocity.  Therefore  we  turn  to  a  computational  solution  of  the  above  equations.  The 
methodology  chosen  is: 

•  Initialize 

•  Step  forward  w  on  interior  (Equation  1) 

•  Step  forward  u  (Equation  3) 

•  Solve  for  i>  (Equation  2) 

«  Calculate  w  on  viscous  boundary  (Thom’s  Method) 

We  loop  through  as  many  time  steps  as  desired. 

4  BENCHMARKS 

An  exact  solution  by  Taylor  (1923)  and  Suri  (1988)  for  the  spanwise  flowfield  of  a  certain  vortex 
will  serve  as  a  useful  point  of  reference  in  places  to  come,  so  we  present  it  here.  The  problem  is 
similar  to  the  crossflow  part  of  our  problem  listed  above  (Equations  1  and  2),  except  for  the  bottom 
boundary  condition  and  the  initial  condition: 


W«  +  VWy  +  UlUJt  =  ~ (Wyy  +  W„  )  (4) 

<^yy  +  tf’zz  =  -W  (5) 

W  =  ipy 
V  —  —K). 

BOUNDARY  CONDITIONS: 

Boundary  conditions  differ  in  that  the  bottom  is  another  slippery  plane,  like  the  top: 

Equation  4:  w  —  0  On  di»jt ,  dtop ,  dright,  dbott om 
Equation  5:  —  0  On  di^/t,  dtop,  drtght,  dbottom 


INITIAL  CONDITIONS: 


wo  =  —  cos  -r  cos  try 
2  a 


(6) 


Once  the  above  problem  has  been  solved,  the  pressure  P  is  obtained  to  within  an  additive  constant 
from: 


Pl  =  -wt  —  VW,j  —  ww,  + 
Py  —  —  v,  —  vvy  -  wv.  + 


Wyy  +  Wzl 

RU 

Vjy  +  V:: 

ft. 
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The  solution  to  the  above  problem  is: 


u 


w 


V 


p 


—  cos  -f  cos  it  y  exp  — 


(1  + 


2a(l  +  £) 


„  (l  +  £)*2f 

cos  ^  cos  iry  exp  —  0 


2fl(l  +  at) 


cos  sm  icy  exp  — 


R„ 

(l  +  £)*2* 


■  sm  ~  cos  ity  exp  — 


R„ 

(l  +  £)*2< 


2a(l  +  ^)'“  -  Rv 

IT4  2rz  1  ,  2(1  +  -4)x2t 

- -(cos - h  —  cos  2iry)  exp - 2 - 


16a(l+£)' 


R„ 


(7) 

(8) 
(9) 

(10) 

(ID 


This  is  an  exact  solution  for  the  given  initial  condition,  and  our  computations  show  that  it  is 
asymptotically  correct  for  others  (above  a  non- viscous  bottom  surface);  for  example,  Figure  5  shows 
the  residual  between  Suri’s  solution  and  our  computational  solution  for  an  initially  potential  vortex. 
Suri’s  solution  becomes  an  increasingly  better  approximation  as  time  increases. 

By  making  the  computational  bottom  wall  slippery,  we  can  compare  the  results  of  the  above 
numerical  procedure  with  Suri’s  exact  solution.  Figure  6  shows  the  evolution  of  u>  at  z  —  0  for  a  =  1 
and  Rv  =  5  as  given  by  our  numerical  solution;  there  is  no  visible  difference  (except  discretization) 
between  this  and  a  plot  of  Suri’s  solution.  The  same  holds  true  for  plots  of  other  flow  variables. 

In  Appendix  A,  we  derive  a  solution  for  a  vortex  above  a  viscous  wall  which  is  analogous  to 
Equations  7  through  11  above.  For  example,  the  zero’th  order  expression  for  the  vorticity  is: 


T3  /  1  \ 

^(y-  i)exp-  — 

itp 

where  k  is  determined  from 


This  solution  is  similar  to  Equation  7  in  that  both  share  a  cosine  in  z  and  both  have  a  trigonometric 
function  in  ythat  goes  to  zero  at  <9<op.  Suri’s  function  (a  cosine)  goes  to  zero  on  dtottom,  yielding 
a  slippery  bottom  surface,  whereas  our  function  (a  sine  with  displaced  argument)  has  a  shorter 
period,  introducing  just  enough  negative  vorticity  near  <9jottotn  so  as  to  enforce  the  viscous  boundary 
condition.  This  solution  above  a  viscous  boundary  serves  as  another  benchmark;  the  residual  between 
it  and  our  computational  solution  is  presented  in  Appendix  A.  Again,  agreement  is  very  good. 

Another  test  against  an  analytic  result  can  be  made  by  making  the  computational  bottom  wall 
viscous  and  starting  a  potential  vortex  at  the  center  of  a  square  (a  =  1)  box.  Since  T  =  f  v  dl  and 
the  crossflow  component  of  the  velocity  v  is  symmetric  on  the  four  walls,  we  know  that  turning  on 
the  viscous  wall  at  t  =  0  will  instantaneously  decrease  the  circulation  to  exactly  75%  of  its  initialized 
value.  Numerically,  we  measure  the  circulation  via  Green’s  Theorem:  T  =  ff  utdA,  and  find  the 
predicted  drop  to  occur  to  a  high  degree  of  accuracy  (eg,  Figure  33).  The  circulation  is  immediately 
decreased  by  25%  due  to  the  vortex  sheet  of  opposite-signed  vorticity  generated  when  the  viscous 
wall  is  activated. 


1  —  cos 


cos 


—■  sin  \fk- 
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Figure  5:  Residual  between  Suri’a  solution  (Equation  7)  beginning  from  a  fully  diffused  vortex 
and  our  computational  solution  for  flow  above  a  slippery  wail  beginning  from  an  initially  potential 
vortex.  After  sufficient  time  the  actual  initial  condition  is  irrelevant,  and  Suri’s  solution  is  a  good 
approximation.  Dotted  line  is  the  circulation  of  an  initially  potential  vortex  for  comparison. 


Y  PCSI7IO) 


Figure  6:  Vorticity  of  a  fully  diffused  vortex  above  a  slippery  wall  at  different  times  as  calculated 
by  our  computational  procedure,  superimposed  on  Suri's  exact  solution  (Equation  7).  Agreement  is 
good  enough  to  make  the  two  sets  of  curves  coincident. 
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5  STREAMWISE  VELOCITY  PROFILES 


At  short  times,  our  numerical  solution  produces  streamwise  velocity  profiles  that,  locally,  are  quali¬ 
tatively  and  quantitatively  the  same  as  Pearson  and  Yang  (Figures  7  to  10).  At  a  vortex  strength  of 
Rv  »  6.5,  we  calculate  a  vertical  slope  of  zero  (^  =  0),  in  agreement  with  Pearson.  Velocity  profiles 
become  more  inflectional  at  higher  Ry.  The  streamwise  velocity  profiles  are  slightly  asymmetric  (the 
magnitude  of  the  perturbation  on  the  upflow/low  speed  side  is  greater  than  that  of  the  perturbation 
on  the  downflow/high  speed  side)  due  to  the  parabolic  profile,  just  as  observed  by  Yang.  These 
areas  of  agreement  occur  in  the  range  t  <  ■  (here,  t  and  yare  dimensional).  This  is  a  diffusive 

time,  corresponding  to  the  diffusion  of  the  vortex  core  to  the  boundary. 

At  longer  times,  the  velocity  gradient  at  the  vortex  center  rotates  back  to  its  original  orientation 
and  value,  and  the  original  undisturbed  flow  is  restored  (Figure  11).  This  is  because  circulation  is 
lost  at  the  boundaries  (both  viscous  and  slippery).  Since  Pearson  and  Yang  worked  in  unbounded 
domains,  their  circulation  remained  constant  (even  though  they  eventually  had  a  finite  amount  of 
vorticity  of  vanishingly  small  density  spread  over  an  infinite  area),  so  their  undisturbed  flows  were 
never  restored.  In  our  present  bounded  domain,  we  see  the  gradient  at  the  center  of  the  vortex  begin 
to  be  restored  soon  after  the  core  hits  the  wall  (eg,  Figure  20). 

Suri’s  solution  (Equations  7  to  11)  yields  a  decay  rate  of  exp—  — —  for  a  vortex  above  a 
slippery  wall.  How  much  faster  does  the  vortex  decay  when  above  a  viscous  wall?  In  Appendix  A, 
we  show  that  the  asymptotic  decay  rate  is  exp  — ,  where  k  is  determined  from 


tan  \Jk  -  tanh 


This  yields  the  same  decay  rate  in  an  infinitely  skinny  box  (a  =  0)  where  the  viscous  wall  is  negligible, 
but  yields  a  decay  rate  almost  twice  that  of  Suri’s  when  in  a  semi-infinite  domain  (a  — »  oo).  This 
means  that  the  flow  variables  decay  like  exp—  bt/sec  ( t  is  dimensional  here)  in  typical  transition 
experiments. 
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Figure  7:  Streamwise  velocity  profiles  at  successive  times.  Total  elapsed  time  is  small.  Flow  is 
initially  parabolic  above  a  viscous  wall,  vortex  is  initially  potential.  Note  the  greater  magnitude  of 
the  low-speed  perturbation,  due  to  the  parabolic  mean  flow. 


Figure  8:  Vertical  streamwise  velocity  profiles  at  successive  times  for  the  flow  in  Figure  7  Vertical 
profile  is  initially  parabolic;  greater  perturbations  occur  after  longer  times.  Note  how  as  0  at  this 
vortex  strength  of  R„  =  6.5,  in  agreement  with  Pearson  and  Abernathy  1 1984). 


Figure  9:  Vertical  streamwise  velocity  profiles  at  successive  times  for  a  vortex  of  strength  Rv  —  4. 


Figure  10:  Vertical  streamwise  velocity  profiles  at  successive  times  for  a  vortex  of  strength  =  10. 
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6  COUNTER-ROTATING  VORTEX 


One  of  the  primary  motivations  for  undertaking  this  investigation  was  to  look  into  the  ramifications 
of  Yang’s  secondary  structure:  is  it  a  counter-rotating  vortex  induced  by  the  main  vortex  above 
the  viscous  wall?  Does  it  affect  the  stability  of  the  flow,  and  thereby  reopen  questions  about  the 
existence  and  stability  of  single  streamwise  vortices  in  real  flows? 

At  a  =  1,  no  counter-rotating  vortex  is  observed  below  some  threshold  Rv (resolution-dependent: 
as  10  on  an  81x81  grid),  whereas  at  higher  R„  a  small  counter-rotating  vortex  is  induced  in  the 
lower  right  corner  (for  a  counterclockwise  main  vortex).  At  larger  aspect  ratios  (eg,  a  =  4),  the 
secondary  vortex  is  much  more  prominent;  it  rises  from  the  wall  to  the  same  height  as  the  main 
vortex.  Another  counter-rotating  vortex  (much  weaker  than  the  first)  is  also  induced  in  the  lower 
left  corner  (Figure  12).  If  the  main  vortex  is  started  close  to  the  wall,  then  this  left  vortex  is  initially 
stronger  than  the  right,  but  the  right  vortex  becomes  the  stronger  as  the  main  vortex  lifts  away 
from  the  wall  (Figure  13).  At  higher  Ru,  the  left  vortex  appears  later,  and  the  right  vortex  develops 
from  a  point  on  the  bottom  wall  closer  to  the  main  vortex  (Figure  14). 

There  are  two  alternative  explanations  for  this  induced  counter-rotating  vortex  on  the  right  side: 

1.  Negative  vorticity  is  generated  at  the  viscous  wall  on  the  bottom  and  is  convected  to  the 
right.  The  counter-rotating  vortex  begins  at  the  wall  on  the  right  when  this  region  of  negative 
vorticity  has  become  sufficiently  concentrated  with  respect  to  the  crossflow  velocity  of  the  main 
vortex. 

2.  The  pressure  is  high  on  the  right  and  left  sides  where  the  velocity  is  low,  and  low  between 
the  vortex  and  the  bottom  wall  where  the  velocity  is  higher.  This  constitutes  an  unfavorable 
pressure  gradient  on  the  right  side,  resulting  in  separation. 

In  Appendix  A,  we  indicate  that  this  counter-rotating  vortex  will  always  be  induced  for  main 
vortices  above  a  threshold  strength  that  is  dependent  on  the  aspect  ratio.  At  a  =  oo,  this  threshold 
is  small,  but  non-zero.  We  find  these  induced  vortices  to  vanish  after  a  finite  time  Also,  since  we 
are  comparing  results  with  experiments  conducted  at  a  =  oo,  it  is  a  potential  concern  that  the  side 
walls  sometimes  seem  to  play  a  part  in  the  development  of  these  induced  vortices  (eg,  Figures  12 
and  13).  In  Appendix  C,  we  show  that  there  is  no  practical  difference  between  the  strengths  of  the 
vortex  induced  in  a  box  of  aspect  ratio  a  ss  4  and  the  strength  of  the  induced  vortex  in  a  semi-infinite 
domain  (a  -  cc). 

At  high  Ru,  the  peak  circulation  of  the  induced  vortex  is  one  order  of  magnitude  smaller  than 
that  of  the  main  vortex  (Figures  15  and  16).  This  small  circulation  has  no  observable  effect  on  the 
streamwise  velocity  profile  (Figures  17  and  18).  At  more  moderate  /?„ ,  the  counter-rotating  vortex 
is  at  least  two  orders  of  magnitude  weaker  than  the  main  vortex  in  circulation;  within  these  ranges, 
the  induced  vortex  is  insignificant  and  has  no  effect  on  the  streamwise  velocity  profile. 
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Figure  12:  Development  of  the  induced  vortex  over  time  for  an  initially  potential  vortex  of  strength 
Ru  —  5.  Bax  is  of  aspect  ratio  a  =  4. 


Figure  13:  Development  of  the  induced  vortex  over  time  for  an  initially  potential  vortex  of  strength 
Rv  —  5  started  near  the  wall.  The  left  induced  vortex  develops  first,  then  the  right  induced  vortex 
overtakes  it  in  strength  as  the  main  vortex  moves  away  from  the  wall,  a  =  4. 
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Figure  14:  Motion  of  tne  centers  of  the  main  and  induced  vortices  for  main  vortices  of  different 
strengths,  Ry.  In  this  figure,  all  vortices  begin  at  the  lowest  vertical  position  shown  and  rise  over 
time,  a  =  4. 


Figure  L5:  Development  of  the  induced  counter-rotating  vortex  over  time.  The  main  vortex  is 
initially  potential,  R„=2 5,  rotating  counterclockwise,  a  =  4 
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Figure  16:  Strength  of  the  maun  (R„  =  25)  and  induced  vortices  of  Figure  15.  Even  for  a  main 
vortex  of  this  large  strength,  the  induced  vortex  is  at  least  one  order  of  magnitude  weaker. 
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Figure  17:  Vertical  velocity  u  through  the  center  of  the  induced  vortex  shown  in  Figure  15.  Note 
the  insignificance  of  this  velocity  in  the  induced  vortex  (the  negative  values  near  c  ss  3.2)  and  the 
dominant  vertical  velocities  underneath  the  maun  vortex  (c  ~  1.5  —  2.2). 
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Figure  18:  Streamwise  velocity  profiles  through  the  center  of  the  induced  vortex  shown  in  Figure  15. 
Velocities  are  greater  at  successive  times  since  we  follow  the  induced  vortex  as  :t  rises.  The  large 
perturbations  near  r  ~  2  are  due  to  the  mam  vortex.  Note  the  absence  ot  any  effect  due  to  the 

induced  vortex  (near  z  ss  3.2). 
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7  FINITE  DISK  OF  VORTICITY 


We  have  previously  used  both  a  delta  function  and  a  fully  diffused  distribution  of  vorticity  as  our 
initial  condition.  However,  we  suspect  that  streamwise  vortices  in  real  flows  begin  with  a  finite  size 
on  the  order  of  the  originating  disturbance.  For  small  initial  disturbances,  we  expect  that  finite  size 
to  be  closer  to  a  delta  function  than  to  being  fully  diffused,  but  what  significant  differences  are  there 
in  subjecting  the  initially  undisturbed  flow  to  a  vortex  of  small  but  finite  structure  as  opposed  to  a 
delta  function? 

For  simplicity,  we  assume  a  vortex  that  begins  with  constant  vorticity  inside  a  cylinder  of  radius 
r0  and  with  zero  vorticity  outside.  This  seems  a  reasonably  more  accurate  initial  view  of  the  vortex, 
yet  simple  enough  for  clear  analysis.  For  this  finite-sized  vortex,  we  see  that  the  streamwise  velocity 
profiles  are  established  more  slowly  over  a  larger  core  (Figure  19),  rather  than  instantaneously  at 
the  center  and  then  at  immediately  following  moments  at  the  edge  of  the  growing  core,  as  for  the 
Oseen  vortex. 

The  streamwise  velocity  gradients  in  the  core  have  not  finished  growing  in  this  figure.  If  we  plot 
the  slope  at  the  vortex  center,  we  see  that  the  delta  function  of  vorticity  sets  the  slope  immediately 
(first  time  step)  to  a  constant  value.  Vortices  with  successively  larger  radii  (but  the  same  Ru)  take 
successively  longer  times  to  set  the  slope  at  the  center,  but  they  all  eventually  set  the  slope  at  the 
vortex  center  to  the  same  constant  value  (Figure  20).  After  the  core  senses  the  walls,  the  slope 
decays  back  to  the  undisturbed  value  in  the  same  fashion  for  all  cases.  Note  that  it  is  possible  for 
the  slope  to  begin  decaying  before  its  maximum  value  has  been  reached,  eg,  if  a  finite-sized  vortex 
is  placed  with  its  core  edge  sufficiently  close  to  a  wall. 

Do  larger  finite-sized  vortices  generate  stronger  induced  vortices  than  smaller  vortices  (or  even 
delta  functions)  of  the  same  strength?  We  have  always  observed  the  larger  vortices  to  be  weaker, 
although  they  may  be  asymptotically  equal,  barring  interference  from  the  walls. 

How  does  the  time  to  set  the  gradient  inside  the  core  depend  on  the  vortex  strength  Ru ?  Figure  21 
shows  that  the  time  to  set  the  gradient  is  independent  of  Rv,  except  when  the  vortex  is  so  strong 
that  the  gradient  is  rotated  by  more  than  90°,  ie,  when  R„  >  6.5. 

Summarizing,  the  diffusion  time  across  the  initial  vortex  radius  sets  another  time  scale  (in  ad¬ 
dition  to  the  shear)  on  the  growth  of  the  instability,  with  delta  functions  setting  the  fastest  time 
scales  and  larger  radii  structures  setting  slower  time  scales.  This  time  required  to  set  the  gradient  is 
independent  of  Ru  for  Ru  <  6.5.  For  walls  that  are  sufficiently  far  removed,  vortices  of  all  different 
radii  (but  at  the  same  Ru)  eventually  (on  their  respective  time  scales)  achieve  the  same  core  slopes 
and  velocity  profiles,  decay  at  the  same  time  and  rate,  and  therefore  affect  the  flow  equivalently.  For 
vortices  sufficiently  close  to  a  wall,  decay  may  stabilize  a  large-radius  vortex  while  allowing  a  smaller 
one  (of  the  same  R„  and  whose  edge  is  the  same  distance  from  the  wall)  to  go  unstable  (Figure  22). 

Therefore,  sufficiently  far  from  walls,  the  initial  distribution  of  vorticity  does  not  matter  in  the 
long  run.  Near  walls,  the  initially-potential  vortex  is  potentially  the  most  destabilizing  since  its 
profiles  develop  the  quickest  and  stand  the  best  chance  of  going  unstable  before  wall  damping  comes 
into  play,  and  so  represents  a  worst  case. 
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Figure  19:  Stre sunwise  velocity  profiles  for  a  vortex  which  begins  with  finite  radius  =  .1,  Ru  =  6.5. 
Note  how  slowly  the  profile  develops  at  the  center  of  the  core. 
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Figure  20:  Perturbation  of  the  streamwise  velocity  at  the  vortex  center  for  vortices  of  different 
initial  radii  (r0),  but  the  same  strength  (R„  =  6.5).  Larger  radii  vortices  develop  more  slowly,  but 
all  achieve  the  same  eventual  profile.  Note  how  the  original  velocity  gradient  begins  to  be  restored 
after  the  vortex  core  diffuses  to  the  wail  at  (a  2. 
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Figure  21:  Perturbation  of  the  streamwise  velocity  at  the  vortex  center  for  vortices  of  different  R„, 
but  the  same  initial  finite  radius  (=  .05).  Note  that  the  time  to  develop  is  independent  of  Rv  until 
the  gradient  is  rotated  by  more  than  180° . 


Figure  22:  Two  vortices  of  equal  strength  and  distance  from  the  viscous  wall.  Because  the  larger 
vortex  develops  on  a  slower  time  scale,  it  may  be  stabilized  by  the  wall  while  the  smaller  one  goes 
unstable. 


8  DUPLICATING  YANG’S  SECONDARY  STRUCTURE 


Since  the  appearance  of  Yang’s  secondary  structure  was  one  of  the  main  motivations  of  this  work,  it  is 
reasonable  to  attempt  to  duplicate  it  in  order  to  find  some  clues  as  to  its  nature.  From  our  preceding 
results,  we  know  it  is  not  an  induced  vortex  above  a  plane  viscous  wall.  We  try  to  duplicate  Yang’s 
structure  by  artificially  adding  another  vortex  of  opposite  sign  to  the  flow  After  trials  with  many 
initial  sizes,  strengths,  and  locations,  we  make  two  observations:  the  difference  in  the  observed  core 
sizes  of  the  two  vortices  necessitates  at  least  one  of  them  starting  at  finite  size  in  order  to  achieve 
this  size  differential,  and  also  that  the  second  vortex  can  only  exist  for  a  very  short  time  when  very 
near  the  wall  -  either  the  adjacent  velocity  field  must  permit  it  to  grow  away  from  the  wall  ( which  it 
generally  does  not  for  an  artificially  introduced  vortex),  or  the  second  vortex  must  be  started  away 
from  the  wall. 

The  configuration  which  finally  comes  closest  to  duplicating  Yang’s  structure  is  a  large  finite 
disk  of  vorticity  (corresponding  to  his  single  vortex),  and  a  weak  (20%  as  strong  as  the  main) 
counter-rotating  delta  function  of  vorticity  at  the  edge  of  the  main  vortex  core  (Figure  23).  This 
yields  streamwise  velocity  profiles  similar  to  Yang’s  data,  with  effects  of  the  secondary  structure 
undetected  far  from  the  wall  and  more  pronounced  closer  (but  not  at)  the  viscous  wall  (Figure  24). 

To  be  this  strong  and  to  appear  in  this  location,  it  seems  a  reasonable  guess  that  this  secondary 
structure  is  induced  in  the  90°  angle  between  the  bottom  wall  and  the  vortex  generator,  rather  than 
above  just  a  plane  viscous  boundary  (Figure  25).  This  secondary  structure  might  be  reduced  by 
softening  the  sharp  corner  between  the  generator  and  the  table.  In  any  case,  this  second  vortex 
seems  to  be  only  about  20%  as  strong  as  the  main  vortex,  and  therefore  should  have  negligible  effect 
on  stability  for  vortex  strengths  within  our  typical  range  of  interest. 


Figure  23:  Streamlines  of  on  unequal  counter-rotating  vortex  pair  it  t  -  0  and  at  a  later  time  The 
main  vortex  is  more  diffuse  and  five  times  stronger  than  the  induced  vortex. 
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Figure  24:  Streamwise  velocity  profiles  at  various  vertical  positions,  but  at  the  same  fixed  time,  for 
the  vortices  of  Figure  23.  These  profiles  are  qualitatively  the  same  as  Yang’s. 
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Figure  25:  A  possible  geometry  for  generating  the  vortices  and  profiles  of  Figures  23  and  24.  This 
counter-rotating  vortex  induced  in  the  viscous  corner  of  Yang's  vortex  generator  may  be  the  source 
of  lus  observed  secondary  structure. 
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9  STABILITY  IMPLICATIONS 


We  can  combine  previous  empirical  and  analytic  data  with  some  hypotheses  to  yield  some  simple- 
minded  implications  about  the  stability  of  our  flow. 

Yang  (as  reported  by  Pearson  (1985))  observes  transition  of  streamwise  vortices  to  typically 
occur  at  a  dimensioned  time  of  tun„abu  as  ^ .  Pearson’s  eigenvalue  calculations  show  that  the  most 
unstable  mode: 


grows  10X  by  f  = 
grows  100X  by  t  = 
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This  confirms  that  fun«taU«  ~  ^  may  be  close.  Another  piece  of  supporting  evidence  comes  from 
Suri’s  observations  of  the  growth  of  an  instability  in  a  low  speed  streak.  The  instability  grows  over 
a  dimensional  distance  x  —  15  —  10  =  5cm  (his  p.139).  His  observations  are  taken  at  y  —  095cm,  at 
which  u  =  70^  (his  p.138).  At  ti  =  70^:  ||  =  ^7'icm***’  =  Therefore  the  nondimensional 
time  for  transition  is:  t+  —  •  ^22  —  42.9.  Again,  this  supports  the  notion  that 

turntable  ~  ■  Another  observation  that  we  will  use  is  derived  from  our  previous  computations:  for 

a  vortex  located  at  y  =  .5,  we  observe  wail  damping  affect  the  vortex  center  at  a  dimensional  time 
of  tdamp  « 

Assume  that  an  unbounded  flow  with  an  embedded  streamwise  vortex  undergoes  transition  (due 
to  linear  instability)  for  Rv  greater  than  some  critical  value  (~  3  according  to  Pearson)  at  time 
^unstable-  Also  assume  that  wall  damping  prevents  this  and  stabilizes  the  flow  if  and  only  if  the 
vortex  center  feels  the  wall  before  time  tdamj).  Since  we  have  previously  shown  that  flow  variables 
decay  like  ~  exp(— 5t/sec)  (for  typical  values  of  the  flow  parameters)  after  tdamp,  this  seems  a 
reasonable  assumption.  These  assumptions  yield  three  cases: 


1-  ^unstable  **  t damp  Stable 
— -  ^unstable  —  t damp  marginally  stable 

3-  turntable  t  damp  ^  unstable 
For  the  marginally  stable  case  of  tun„aile  =  tdamp: 


40  _  40h^  _  .04 h2 
K  ~  Rv  V 


Re  =  1000 


Of  course,  changes  in  the  empirical  data  or  in  the  y  location  will  change  this  value  of  Rr.  We 
can  look  at  an  arbitrary  y  location  in  a  parallel  shear  flow: 


t  -LL 

''unstable  — 

tty  HQ 


(where  /  was  40  above). 


tdamp  — 


■igiy  -  yo)2h2 


(12) 


(13) 
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where  there  is  a  single  damping  wall  at  y0  (and  where  g  was  .04  above). 

The  neutral  curve  is  again  located  at  tunt(abie  —  UamP  =  0: 

(y-yo)\  =  ^-  (i4) 

For  a  given  velocity  profile,  we  can  take  either  Re  or  y  and  solve  for  the  other  via  this  expression. 
To  maximize  or  minimize  the  instability,  we  would  place  the  vortex  at  the  location  where 
(turntable  —  tdam p)y  =  0,  ie,  where  the  time  to  damp  the  disturbance  most  exceeds  the  time  re¬ 
quired  for  the  disturbance  to  succomb  to  linear  instability,  or  vice-versa.  Therefore: 

(y-  yo)v;  +  =  0  (15) 

Again,  knowledge  of  either  Rot  the  most/least  stable  {/position  in  a  given  parallel  shear  flow 
allows  us  to  solve  for  the  other.  These  equations  were  derived  for  a  single  damping  wall  at  y  =  y0. 
A  similar  analysis  could  be  done  for  more  complicated  geometries. 

In  the  case  of  our  usual  Poiseuille  flow  above  a  single  damping  surface  at  y  =  0  and  an  uncon¬ 
strained  free  surface  at  y  »  1,  a  neutral  curve  of  Rv s  vertical  position  is  found  from  Equation  14  to 
be: 

2/3  “  ^  +  8*5  =  ° 

Using  the  data  £  =  =  1000: 

y3_y2  +  i|  =  0  (16) 

The  discriminant  vanishes  when 

H5(  125  _  J_)  =  o 

R  v  4R  27 
=>Re  =  843.75 

Similarly,  the  locations  at  which  a  disturbance  is  most/least  stable  are  found  from  Equation  15: 

y 3  -  -2y-  +  y  -  rrs  =  0 


or  with  empirical  values  for 


y3-2y3+y-  — =0 


This  discriminant  vanishes  at  R  =  421.875 

These  curves  are  sketched  in  Figure  26.  The  curve  T  =  tu„,f0»ie  ~tiamp  =  0  is  the  neutral  curve; 
a  disturbance  on  this  curve  will  be  neutrally  stable.  Unstable  disturbances  fall  to  the  right  of  this 
curve,  stable  disturbances  fall  to  the  left.  RCl  the  minimum  possible  flow  Reynolds  number  that  can 
sustain  instability,  is  843.75  for  sufficiently  strong  disturbances  located  at  a  dimensional  height  of 
y  =  jh.  The  upper  part  of  the  curve  Ty  =  0  denotes,  for  a  given  R,  the  most  destabilizing  position 
for  a  disturbance.  Likewise,  the  lower  part  denotes  the  (locally)  most  stable  position.  The  ‘most 
unstable”  curve  passes  through  Rc ,  as  it  must.  The  most  unstable  position  is  always  in  the  upper 
third  of  the  flow  (when  R  >  Rc).  Rr  —  oc  at  y  —  0  (cannot  go  unstable  right  against  the  viscous 
bottom  wall)  and  at  y  —  1  (the  local  shear  is  zero,  so  it  takes  infinitely  long  to  go  unstable).  It  is 
interesting  that  when  near  the  wall,  the  most  stable  position  is  not  right  at  the  wall,  but  slightly 
above  it. 
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Note  from  Equation  14  that  the  empirical  number  £  is  just  a  multiplicative  constant  that  sets 
the  scale  of  the  abscissa.  Therefore  if  its  value  is  actually  somewhat  different  from  what  we  have 
inferred,  the  only  change  to  Figure  26  would  be  to  re-scale  the  numbers  along  the  Z?axis  -  all  else 
would  remain  unchanged. 
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Figure  26:  Neutral  curve  of  the  neutrally  stable  Reynolds  number  (for  the  most  destabilizing  mode 
of  a  disturbance)  at  a  given  distance  from  the  viscous  wall.  Z?coccurs  at  343.75.  The  curve  on  which 
a  disturbance  is  most  stable/most  unstable  passes  through  this  point,  and  bifurcates  at  R  =  421.375. 
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10  STABILITY  OF  CORE  VS  UPFLOW  REGION 


Many  researchers  have  been  preoccupied  with  identifying  and  monitoring  the  development  of  insta¬ 
bility  in  the  upflow  region  between  a  counter-rotating  vortex  pair  (such  as  between  the  two  legs  of  a 
hairpin  vortex) .  Yet  we  have  already  seen  the  highly  inflectional  profiles  that  can  develop  within  the 
core  of  a  single  vortex.  Applying  the  same  time  scale  assumptions  and  arguments  used  in  deriving 
the  previous  neutral  curve  (Section  9),  we  can  investigate  which  vortex  geometry  permits  the  middle 
upflow  region  to  go  unstable  first,  and  which  geometry  permits  the  two  vortex  cores  to  be  the  first 
to  go  unstable. 

We  consider  a  typical  vortex  pair,  each  leg  of  strength  R„  =  5,  as  shown  in  Figure  32.  Over 
time,  the  two  legs  lift  each  other  and  push  each  other  away  (Figure  27).  Figure  28  shows  how  the 
inflectional  velocity  profiles  develop  at  the  vortex  center  for  different  initial  distributions  of  vorticity; 
as  we  have  seen  before,  more  diffuse  initial  distributions  of  vorticity  (at  the  same  total  circulation) 
perturb  the  streamwise  velocity  profiles  more  slowly  so  that  the  maximum  perturbation  occurs  at  a 
later  time.  For  boundaries  that  are  sufficiently  far  removed,  this  maximum  value  is  the  same  for  all 
initial  distributions  of  vorticity.  Since  boundaries  are  close  at  hand  here  (ie,  the  virtual  boundary 
caused  by  the  companion  vortex),  viscous  damping  may  set  in  before  the  maximum  streamwise 
perturbation  is  achieved,  and  thus  more  diffuse  vorticity  distributions  create  smaller  maximum 
perturbations  in  the  streamwise  velocity. 

Inflectional  profiles  also  develop  in  the  upflow  region  at  the  z  location  exactly  between  the  vortex 
cores.  There  may  be  zero,  one,  or  two  inflection  points  at  this  z  location,  and  they  may  vary 
in  vertical  location  over  time  (Figure  29).  For  analysis,  we  assume  that  any  instability  at  this 
z  location  will  occur  at  that  y  where  is  a  minimum  (if  fp-  ^  0  for  all  y),  or  at  that  y  where 

fp  =  0  (if  that  occurs  at  a  unique  y),  or  at  the  minimum  y  at  which  §p  =  0  (if  there  axe  two  such 
y  locations).  This  last  criterion  is  selected  because  the  inflection  at  minimum  yis  located  at  or  near 
the  vertical  location  of  the  vortex  centers  where  the  shear  has  been  perturbed  the  most.  Figure  30 
shows  how  this  most  unstable  y  location  (defined  via  the  above  criteria)  moves  as  a  function  of  time 
for  different  initial  distributions  of  vorticity.  The  dips  in  ythat  appear  from  just  after  t  —  0  to 
near  t  as  .2  are  where  the  flow  goes  from  |p  >  0  at  all  yto  §p  <  0  near  the  middle  (yielding  two 
inflection  points)  -  we  then  follow  the  development  and  eventual  disappearance  of  the  lower  (in  y) 
point  where  |p  —  0.  Note  that  very  diffuse  vortex  pairs  (radius  =  .5)  never  develop  an  inflection 
point  in  the  upflow  region.  When  observing  how  develops  over  time  at  the  y  locations  shown  in 
Figure  30  we  find,  as  expected,  that  more  concentrated  vortices  produce  the  greatest  perturbations 
in  slope  from  equilibrium. 

The  time  required  for  the  core  to  go  unstable  is: 


(as  in  the  previous  neutral  curve  derivation.  Equation  12).  The  time  required  for  the  upflow  region 
to  go  unstable  is: 

,  -  40 

‘ upflow  —  2^ 

dy  upfiow 

(since  jjupjlow  =  0).  Figure  31  shows  how  At  =  We  -  tupfiow  develops  over  time.  At  >  0  means 
that  the  upflow  region  will  go  unstable  first,  and  At  <  0  means  that  the  vortex  cores  will  go  unstable 
first.  Fortunately,  the  curves  are  fairly  horizontal  (ie,  time  independent),  and  we  can  identify  the 
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dividing  case  At  =  0  as  being  roughly  at  vortex  radius  *  .25  (Figure  32).  Initial  distributions 
of  vorticity  more  diffuse  than  this  will  cause  the  upflow  region  to  go  unstable  first.  Distributions 
more  concentrated  than  this  will  cause  the  vortex  cores  to  go  unstable  first.  This  is  consistent  with 
experimental  results;  Suri’s  diffuse  array  of  vortices  experiences  its  initial  rms  increases  in  the  low 
speed  streak  (his  p.  139) ,  whereas  Yang’s  single  vortex  begins  as  a  more  compact  structure  with  no 
apparent  twin,  and  goes  unstable  first  at  the  core. 
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Figure  27:  Motion  of  the  center  of  one  of  a  pair  of  equal  counter-rotating  vortices  over  time.  The 
vortex  shown  is  rotating  counterclockwise,  resulting  in  an  upflow  region  in  the  neighborhood  of 
r  =  2.  The  motion  of  r.he  other  vortex  is  obtained  by  reflecting  about  the  line  r  =  2. 
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Figure  28:  Perturbation  of  t&e  streamwise  velocity  field  at  the  center  of  the  vortex  core  due  to 
vortices  of  different  radii.  Note  that  larger  radii  vortices  have  less  effect  before  the  original  profile 
begins  to  be  restored. 
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Figure  29:  Vertical  streamwise  velocity  profile  in  the  uptlow  region  between  two  counter-rotating 
vortices  soon  after  t.  =  0.  Note  the  presence  of  two  inliection  points,  with  the  lower  one  closer  to  the 
two  vortex  centers  where  the  flow  has  been  most  perturbed. 


Figure  30:  Vertical  position  of  the  lower  inflection  point  in  the  upflow  region  between  two  counter¬ 
rotating  vortices.  The  dips  at  t  <  .2  are  where  the  lower  and  upper  inflection  points  appear  and 
separate  from  each  other  (and  we  follow  the  lower).  At  all  other  times  there  is  no  inflection  point; 
we  then  plot  the  location  where  the  second  derivative  achieves  its  minimum  value. 
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Figure  31:  Time  required  for  the  core  to  go  unstable  minus  the  time  required  for  the  uptiow  region 
to  go  unstable.  These  times  are  derived  from  the  assumptions  of  Section  Note  that  these  times 
are  approximately  equal  for  vortices  whose  initial  diameter  is  equal  to  rheir  reparation. 
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Figure  32:  Three  counter-rotating  vortex  pairs  of  constant  separation  and  strength.  Figure  31  shows 
that  the  small  concentrated  pair  on  the  left  will  go  unstable  in  the  cores  first,  the  large  diffuse  pair 
at  right  will  go  unstable  in  the  middle  upflow  region  first,  and  the  center  pair  is  the  intermediate 
case  where  cores  and  upflow  region  go  unstable  together. 


11  CONCLUSIONS 


Before  the  vortex  core  hits  the  wall  (t  <  ),  we  find  that  the  crossflow  and  streamwise  velocity 

profiles  are  locally  the  same  as  those  derived  by  Pearson.  After  hitting  the  wall,  we  find  that  flow 
variables  decay  back  to  their  undisturbed  values,  typically  ~  exp  (—bt/sec).  It  is  wall  influence 
that  is  responsible  for  restoring  the  original  profile.  We  find  a  perturbation  solution,  analogous  to  a 
previous  analytic  solution,  that  describes  the  flow  well  after  this  time. 

Sufficiently  faff  from  wadis,  any  initial  distribution  of  vorticity  has  the  same  long-term  effect  on 
stability.  Close  to  a  wall,  smaller  (more  concentrated)  vortices  may  be  more  destabilizing  than  larger 
(more  diffuse)  ones  of  the  same  circulation. 

Peaffson’s  velocity  profiles  are  quite  remarkable  in  that  they  show  how  just  a  tiny  bit  of  streaunwise 
vorticity  (eg,  a  circulation  of  ~  .4®^  for  liquid  waiter)  can  destabilize  a  sheaff  flow.  The  fact  that 
we  achieve  these  same  velocity  profiles  above  a  viscous  wall  aind  the  fact  that  we  find  them  to  be 
relatively  insensitive  to  initial  vortex  structure  adlows  us  to  conclude  that  the  single  streaunwise  vortex 
is  a  promising  model  for  describing  launinair- turbulent  transition  in  real  flows  and  for  investigating 
(in  a  very  idealized  fashion)  one  of  the  most  commonly  recurrent  structures  in  the  self-3ustenance 
cycle  of  turbulence. 

A  streamwise  vortex  above  a  plane  viscous  wadi  whose  strength  is  greater  than  some  small  (but 
non-zero)  threshold  induces  a  counter-rotating  vortex.  This  counter-rotating  vortex  exists  for  a  finite 
time,  and  is  insignificant  to  stability  within  our  typical  parameter  range  of  interest.  So  once  a  single 
streamwise  vortex  is  created,  the  secondary  structure  is  irrelevant  to  the  stability  of  the  flow  (within 
the  parameter  range  for  which  the  main  vortices  are  normally  observed),  and  can  be  safely  omitted 
from  analysis. 

Yang’s  structure  seems  to  be  a  second  counter-rotating  vortex  of  smaller  size  and  ~  20%  strength 
of  the  main  vortex,  possibly  induced  in  the  90°  corner  between  the  bottom  viscous  wadi  and  the  side 
of  the  vortex  generator.  There  is  no  effect  on  the  streamwise  velocity  profiles  or  stability,  so  he 
really  does  seem  to  have  generated,  investigated,  and  drawn  conclusions  about  an  essentially  single 
streamwise  vortex. 

The  role  of  the  Reynolds  number  in  transition  seems  to  be  to  set  the  local  shear,  and  thus  the 
time  scale  of  the  instability,  as  a  disturbance  grows  toward  instability  in  a  race  against  viscous 
damping.  At  the  critical  Reynolds  number,  the  time  to  go  unstable  is  the  same  as  the  time  required 
for  wall  damping  to  be  felt.  Above  Rr,  the  Reynolds  number  sets  the  local  shear  in  the  vicinity  of 
a  disturbance  high  enough  so  that  the  disturbance  goes  unstable  on  a  time  scale  shorter  than  the 
viscous  time  scale  required  for  wall  damping,  so  the  local  instability  "beats'  wall  damping.  Coupled 
with  some  empirical  data,  this  interpretation  yields  a  critical  Reynolds  number  for  Poiseuille  flow  of 
Rc  as  844. 

In  situations  with  a  pair  of  equal  counter-rotating  vortices,  the  initial  instability  may  develop 
either  in  the  common  upflow  region  (diffuse  vortices)  or  in  the  core  centers  (more  concentrated  vor¬ 
tices),  depending  on  the  initial  vortex  structure. 


The  first  author  gratefully  acknowledges  the  support  provided  by  the  Department  of  the  Army 
and  the  West  Point  Education  Center  through  the  Tuition  Assistance  Program. 
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A  PERTURBATION  SOLUTION 


Suri  found  a  solution  for  the  crossflow  variables  for  a  vortex  above  a  slippery  wall  (Equations  7*11); 
his  solution  is  exact  for  a  vortex  that  begins  fully  diffused  in  a  specified  way  (Equation  6),  and  is 
asymptotically  accurate  for  other  initial  distributions  of  vorticity.  It  is  desirable  to  have  an  analogous 
expression  for  a  vortex  above  a  viscous  wall,  both  for  further  simple  analytic  modeling,  and  to  see 
what  such  a  solution  can  tell  us  about  our  current  counter-rotating  vortex  concerns. 

A.l  CROSSFLOW  SOLUTION 

As  shown  in  Figure  33,  for  a  non-viscous  bottom  wall,  the  vortex  circulation  is  constant  until  the 
core  hits  the  wall.  The  circulation  then  decays  exponentially.  For  a  viscous  bottom,  the  circulation 
instantaneously  drops  due  to  the  vortex  sheet  generated  at  t  =  0,  then  decreases  as  negative  vorticity 
continues  to  be  generated  at  the  wadi.  When  the  core  reaches  the  wall,  the  circulation  again  appears 
to  assume  an  exponentiad  decay  (albeit  at  a  different  rate).  A  semilog  plot  of  circulation  vs  time 
confirms  this  exponential  decay. 

With  this  additional  clue,  amadytic  progress  is  possible.  Under  our  two  original  assumptions 
(concerning  x  independence  amd  the  free  surface  shape),  the  anadytic  crossflow  problem  is  as  in 
Equations  1  and  2: 

1  , 

Wt  +  VUy  +  WW,  =  —  (wyy  +  Id,,  ) 

tty 

Boundary  Conditions:  w  =  0  On  du/t,  dtop,  dright 
w  =  0  On  dMtom 

t!>w  +  Vi*  =  -w 

Boundary  Conditions:  ti>  =  0  On  du/t,  dtop,  dright ,  dbottom 


W  ~  \Py 


v  =  -w. 


Since  the  circulation  and  other  flow  variables  decay  approximately  exponentially  alter  the  core 
reaches  the  wall: 

w(z,y,t)  =  w'(r,y)exp-—  -(18) 

tty 

and  similarly  for  the  other  flow  variables.  Substitution  yields: 


(V2  +  k)u)'  =  cR„  exp  +  i u'Jl 

tty 


;i9) 


where 


«  =  exp  - 


k(t-t  q) 
R„ 


where  to  is  the  time  at  which  the  exponential  decay  sets  in  (ie.  soon  after  wall  effects  begin  to  be 
felt),  and  where  we  use  the  2-D  Laplacian: 
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Adopting  a  regular  perturbation  expansion: 

u'  —  wo  +  cu\  +  c2wj  +  •  •  •  (20) 


and  similarly  for  the  other  flow  variables.  Note  that  assuming  an  expansion  of  this  form  makes  our 
previous  time  derivative  (Equation  19)  only  approximately  correct  since  our  expansion  parameter 
e  is  also  a  function  of  time,  but  computations  tell  us  to  expect  that  after  a  sufficient  time,  the 
time  dependence  will  be  carried  mainly  in  the  first  few  modes  of  this  (essentially)  weakly  nonlinear 
expansion;  therefore  we  anticipate  only  a  small  error  (to  be  checked  shortly)  and  proceed.  This 
yields  a  sequence  of  coupled  (through  the  boundary  conditions)  elliptic  problems: 


r  (v2  +  k)w0  =  o 

1  V2^o  =  -wo 


(21) 


{(V2  +  i)wi  =  Ry  exp  --^(vowo,,  +  u>owo„.) 

V2V>i  =  -wj 

{(V2  +  k) wj  =  Ry  exp  --^“-(uowi^  +  u/owtil  +  viw0,y  +  u/iw0,,) 

V2V>2  =  — WJ 


(22) 

(23) 


Boundary  conditions  for  each  w,  and  ipi  component  are  all  individually  the  same  as  given  above 
for  w  and  0.  Note  that  these  boundary- value  problems  do  not  allow  the  specification  of  an  initial 
condition.  Suri’s  solution  (Equations  7  to  11)  shares  this  feature.  Soon  after  the  core  has  reached 
the  boundaries,  information  about  initial  conditions  has  been  diffused  away. 

These  equations  are  linear  and  separable.  The  solution  to  the  zero’th  order  equations  (21)  is: 


Wq  = 


t/>  0  = 


Wo  = 


k  sinh  t 


A  cos  sin  \fk  —  ^(y  —  |) 

1  / - r  sin  Jk  — 

A cos  ^(-  sin  yjk  -  fy( y  -  h) - 

Jk -  4  , - 

A  cos  ^(- — ~~  cos  yjk-  j?(y  -  ^) 

x sin  v/fc  -  £(y-  A) 

- 


sinh  f(y  -  i)) 


v0  = 


Po  = 


1  / -  sinwfc-TT 

=  AZsm!f(-sm^k-$(y-i)-  sinhj(y-l)) 


sin  Jk  -  4 


where  k  is  determined  from  the  eigenvalue  relationship: 


tanh  * 

T 


(24) 

(25) 

(26) 

(27) 

(28) 

(29) 
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For  an  initially  fully  diffused  (in  the  sense  that  it  starts  with  the  same  form  as  uiq  above)  vortex 
above  a  viscous  wall: 


The  solution  to  the  first  order  equations  (22)  is: 


A2Ry  exp  — ^  sin  ^p(£sin  Jk-  **r(y  -  ^) 

sin \fk  -  £  I - - 

- 4*smh~  Sin  Vk~  ^y  ~  2) cosh  «(y  ~ 

A 

A*Ry  exp  sin  sin  Jk-  ^r(y  -  £) 

- *71 - Y  .  , —  sin  Jk  -  ~t  (y  -  i)  cosh  %(y  -  §) 

4  fca(fe  +  ^)  «nhf  V  2 

*Jk~$  sin 77- 4  r . . . ,, 


rJk-%  ainJk-%  y - - 

- y - — - Y  ■  —  cos  Jk-Js(y  -  l)sinh^(y-  ^) 

2  afc2(fc  +  *£)  smhf  V  aV  «  2 

,  (*  +  ^jr )  sin2  \Jk~—~^  B  sin  Jk-*g  .  2x,  lxx 

+( - r-r-* — ; - ,  - )sinh — (y-  £)) 

8fc2(*  +  a?£)sinh2*  isinh^f  a 

,a„  Ua  .  2 wz,Bjk-^  f - — 

A'Rvexp-—sin—{-  - cos^Jk- ^r{y  -  f) 


-  -  =°s  -  i)  c^h  f(1,  - 

2*/(*  +  #)sin  2\Jk~^$  B  sin  Jk  -  ,  2x,  lxx 

H - ( - Tj  — ; - .  r-r~-,z - )cosh  — (y  -  j)) 

“  8i2(F+s£)sinh2i  ismhf  a  2' 

2  kt0  2n  2 irz  B  .  f  ^7  , 

— A  Rv  exp  —  — - cos - (  —  sin  Jk-  (y  -  t) 

/tp  d  CL  K 


2o  kto  2tt  2 itz  B  .  r  ^7  , 

— A  Rv  exp  —  — - cos - (-sin  Jk-  *fr(y  - 

/tp  d  CL  K 

Ic  4.  sin  \/fc  -  “y  / - - 


irjk  —  fr  sin  y/ Jb  -  $  , - - 
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Ry 

a 

k  (Numerical) 

k  (Analytic) 

5 

i 

(slippery  wall) 

19.7550 

19.7390 

5 

i 

(viscous  wall) 

26.2775 

26.2798 

5 

2 

yy 

21.2513 

21.2702 

3 

3 

n 

20.2251 

20.6118 

8 

3 

n 

19.8104 

20.6118 

5 

4 

yy 

20.1585 

20.4145 

- 

oo 

yy 

— 

20.1907 

Table  2:  Computed  vs  analytic  decay  coefficients  ( k ) 


(33) 


(34) 


Note  the  turning  point  at  k  =  *5^,  ie,  at  a  w  1.301. 

Given  the  geometrically  increasing  effort  required  and  the  exponentially  decreasing  importance  of 
the  higher  order  modes,  the  first  order  solution  seems  a  good  stopping  point.  Note  that  up  through 
the  first  order  solution,  t0  (the  time  at  which  <  becomes  small)  cancels  out  of  the  solution. 

Note  the  similarity  of  Equation  24  to  Suri’s  solution  (Equation  7)  for  a  diffused  vortex  above 
a  slippery  wall.  Both  solutions  for  w  share  a  cosine  term  in  the  z  direction  which  goes  to  zero  at 
the  slippery  side  walls,  and  a  trigonometric  term  in  y  that  goes  to  zero  at  the  top  surface.  In  the 
y  direction,  the  half-period  of  Suri’s  cosine  is  equal  to  the  flow  depth  in  order  to  enforce  the  slippery 
boundary  condition  on  the  bottom  wall.  Our  function  (a  sine  with  displaced  argument)  has  a  shorter 
period,  introducing  just  enough  negative  vorticity  in  the  region  near  the  viscous  wall  so  as  to  enforce 
the  no-slip  boundary  condit.on. 

We  verify  the  accuracy  of  this  approximate  solution  by  calculating  the  residual  between  the 
zero  th  and  first  order  solutions  (Equations  24  and  30  substituted  into  Equations  18  and  20)  and 
the  numerical  solution  (Figure  34).  We  see  that  the  residual  of  this  approximate  solution  quickly 
becomes  small  (compared  to  the  vortex  circulation).  Now  that  we  have  demonstrated  its  accuracy, 
it  is  appropriate  to  determine  the  implications  of  this  analytic  solution. 

Flow  variables  decay  like  ~  exp  —  where  k  is  determined  from  Equation  29.  Table  2  shows 
values  of  the  nondimensional  decay  rate  coefficient  (k)  as  found  from  numerical  simulation  and  as 
calculated  analytically.  Agreement  is  good.  Discrepancies  are  attributable  to  the  fact  that  we  are 
calculating  the  decay  rate  at  a  finite  time,  and  sire  not  extrapolating  to  the  asymptotic  limit.  This 
is  why,  for  example,  the  vortex  at  R„  =  8  inside  a  box  of  aspect  ratio  a  =  3  is  off  by  more  than 
a  vortex  at  Ru  =  3  at  the  same  aspect  ratio:. the  stronger  vortex  is  reaching  the  asymptotic  decay 
rate  more  slowly,  but  both  share  the  same  limit.  For  a  single  vortex  in  a  box  of  infinite  aspect  ratio, 
we  find  the  decay  rate  to  be  »  *'^^7  (**  *^' dimensionally).  (Actually  at  this  aspect  ratio,  the 

nondimensional  decay  rate  =  •£-,  where  c  is  defined  in  Equation  35  below).  In  typical  transition 
experiments,  such  as  Yang’s  and  Suri’s  flow  table  experiments,  this  means  that  flow  variables  decay 
like  ~  exp(— 5t/sec)  (f  is  dimensional  here).  We  can  put  bounds  on  k.  k  >  else  Equation  29 


.  /(*  +  ^r)8in2  5  sin  yjk-ty  2x, 

+<  8t>(t+*W; - l*mhT(»-l» 


B  = 


sin 


\Jk-$ 


2  (k  +  #)(-*  sjk~*£  cos  yjk-*£  + 


2  sin 


in  yj  k  — 


4  it  3 
a3 


t&nh  ■ 


has  the  unique  solution  k  =  0,  ie,  the  trivial  solution.  Now  k  >  £  implies  that  we  must  use  the 
second  or  higher  branch  of  the  tan  function.  (Our  relation  actually  admits  a  countably  infinite 
number  of  eigenvalues  k,  one  for  each  positive  branch  of  the  tan  function).  But  solutions  on  higher 
branches  will  decay  faster  (each  branch  at  least  Air2  faster  than  the  next  lower  one),  so  the  lowest 
possible  branch  will  dominate  in  observations.  Therefore,  we  should  use  the  second  branch  of  the 
tan  function.  As  Equation  29  is  written,  it  equates  the  slope  of  the  secant  line  through  the  origin 
and  a  point  on  the  tank  curve  to  the  corresponding  secant  line  for  the  tan  function.  The  argument 

\Jk  —  £  must  equal  the  value  of  the  independent  variable  at  the  point  of  intersection,  and  so  must 
take  on  a  value  in  the  range  (ir,  c],  where  c  is  determined  from 

tan  c  —  c  (35) 

on  the  second  branch  (meaning  that  c  as  4.49).  Therefore  bounds  on  k  are  +  ir2  <  k  <  £  +  c2. 
The  actual  solution  k,  along  with  its  upper  and  lower  bounds,  is  plotted  in  Figure  35.  Note  that  the 
lower  bound  is  just  Suri’s  decay  rate  (Equations  7  to  11)  for  a  slippery  bottom  boundary,  k  achieves 
this  lower  bound  as  a  — ►  0  (ie,  in  an  infinitely  skinny  box)  which  makes  sense:  we  would  expect  the 
bottom  viscous  wall  to  have  negligible  effect  as  it  is  made  infinitely  small.  From  the  figure,  we  also 
observe  that  k  is  almost  constant  for  a  >  4;  when  the  side  walls  are  sufficiently  far  away,  decay  is 
dominated  by  the  viscous  wall. 

Decay  rates  comprise  an  interesting  additional  benchmark.  Figure  36  shows  how  the  decay  rate 
of  the  circulation  quickly  approaches  the  theoretical  value  of  Equation  7.1.12.  Suri’s  decay  rate  for 
a  fully  diffused  vortex  above  a  slippery  wall  (Equations  7  to  11)  is  roughly  only  half  this  value.  A 
streamwise  analysis  (see  section  A.2  following)  similar  to  that  performed  above  for  the  crossflow 
shows  that  streamwise  flow  variables  should  have  an  asymptotic  decay  rate  of  k ,  =  t£(;  +  £>- 
Figure  36  also  shows  the  agreement  between  this  rate  and  the  calculated  decay;  convergence  is 
slower  because  decay  is  ultimately  driven  through  spanwise  processes  which  act  via  coupling  in  the 
convective  terms.  Again,  agreement  in  both  of  these  cases  lends  another  degree  of  confidence  to 
both  the  computational  and  analytic  solutions. 

Analysis  of  Equations  24  to  29  shows  that  the  vortex  moves  (asymptotically)  to  x  —  0  and  to  a 
y  location  determined  from: 


5 yfr  -  T?(y- 

cosh  f(y-  i) 


cos  \/l-  -  JT 
cosh  ~ 


This  y  position  is  plotted  in  Figure  37.  When  a  —  0  (infinitely  skinny  box),  then  y  =  .5  as  for  a 
slippery  bottom,  which  again  seems  consistent.  When  a  — *  oo  (infinitely  long  box),  then  y  =  2  - 
(as  .6).  Therefore  if  a  vortex  is  started  nearly  centered  in  y  (which  Yang  does),  then  it  will  raise  only 
slightly,  explaining  why  he  detects  no  movement. 

To  investigate  the  induced  vortex,  we  check  for  separation  at  the  viscous  wall:  lim,,_o(  Vo+^i )  = 
0,  or  after  reducing  several  indeterminate  forms:  limv_o(wo  +  ewi)  =  0.  In  other  words,  a  point  of 
separation  is  also  a  point  with  zero  vorticity.  We  deduce  that  separation  point(s)  is(are)  located  at: 


z  =  —  sin 

7T 


exp  jf-sinyjk-^  ■ 


2 AR„(  B  sin  Jk  -  + 


*>Jk-  K 


This  induced  vortex  is  present  at  t  =  0;  this  solution  does  not  model  its  beginning  well,  but  does 
better  at  later  times. 
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The  induced  vortex  disappears  into  the  lower  right  corner  at 


2AR„(-Bsin  y/k  -  + 


«n3^/k- ■  *| 
4* tanh  — 

a _ 


The  induced  vortex  appears  only  if  the  main  vortex  is  above  a  threshold  strength: 


Rv  > 


sinyA-  £ 


2A(— Bsin  ^jk-*£  + 


^5) 


4  ir  tanh 


(37) 


This  threshold  strength  is  graphed  in  Figure  38.  Note  that  a  =  0  (infinitely  skinny  box)  yields 
a  threshold  of  R¥  — *  oo,  which  again  seems  consistent.  It  is  interesting  that  there  exists  a  small 
but  nonzero  threshold  Ry  for  a  — ►  oo.  However,  the  threshold  is  low  enough  (as  1.2)  that  we  will 
generate  a  counter-rotating  vortex  for  any  interesting  main  vortex. 

The  preceding  expressions  simplify  for  the  case  a-*  oo:  Separation  point(s): 


_  2*e*P£ 

ARV  sin  Vk 

Conditions  for  separation: 

Rv  >  —  (1  -  secVk)- 
*  P 

Ry  ,  irRvp 

•  t  <  — t—  in - -p=- 

k  2(1  -  sec  Vk) 

Here,  p  is  an  empirical  phase  factor  that  extrapolates  the  exponential  decay  back  to  t  =  0.  In 
this  case  the  separating  streamline  between  the  two  vortices  is  a  vertical  line  through  the  above 
z  position.  The  induced  vortex  moves  to  the  same  y  position  as  the  main  (%  .6)  before  receding 
exponentially  to  the  right. 


A. 2  STREAMWISE  SOLUTION 

Still  under  the  assumption  that  the  crossflow  variables  decay  mostly  like  exp— we  have: 

kt  kt  2  1, 

vt  +  miy  exp-—  +  wuz  exp-—  =  —  +  -=-(uyy  +  u„) 

t C|/  iC(/  it|/  ilp 

Assuming  that  the  streamwise  velocity  has  the  same  type  of  exponential  decay  after  a  certain 
time  (but  at  a  possibly  different  rate): 

u  =  2y  —  y~  +  it1  exp  —  k,t 

Substituting  and  retaining  zero  th  order  terms  yields: 

(V2  +  Ryk,)u'  =  0 


( 
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Boundary  Conditions: 
The  solution  is: 

where 


(38) 

(39) 


Agreement  with  computations  is  shown  in  Figure  36. 

In  terms  of  dimensional  variables,  the  transient  part  of  Equation  38  decays  like  exp  — (^  + 

Note  that  this  zero’th  order  solution  is  independent  of  R;  ie,  the  asymptotic  streamwise  decay  rate 
is  fully  driven  by  the  crossflow  geometry.  Also  note  that  the  streamwise  decay  rate  is  essentially 
independent  (related  only  through  a)  of  the  crossflow  decay  rate. 


a 
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Figure  33:  Circulation  of  an  initially  potential  vortex  above  both  a  slippery  and  a  viscous  bottom 
wall.  Above  a  viscous  wall,  note  how  the  circulation  instantaneously  drops  from  2t  to  l.oir  (since 
this  box  is  of  aspect  ratio  a  =  1)  due  to  the  vortex  sheet  generated  at  the  wall.  Decay  quickly 
becomes  exponential. 


Figure  34:  Residual  between  our  computational  solution  for  am  initially  potential  vortex  above  a 
viscous  wall  and  the  zero’ th  and  first  order  perturbation  solutions.  The  circulation  of  the  initially 
potential  vortex  is  included  for  reference. 
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Figure  35:  Asymptotic  decay  rate  for  spanwise  variables  above  a  viscous  wall  as  a  function  of  the 
aspect  ratio  of  the  vortex  array  (Equation  29),  along  with  upper  and  lower  bounds.  The  lower  bound 
is  just  Suri's  decay  rate  for  spanwise  variables  above  a  slippery  wall. 
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Figure  36:  Convergence  of  streamwise  ( f~)  and  spanwise  (T)  decay  rates  to  cfae  asymptotic  values 
(dashed  lines)  of  Equation  29  and  Equation  39.  Rv  =  3.  a  =  2. 
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Figure  37:  Asymptotic  vertical  position  of  the  vortex  'enter  os  a  function  ot  aspect  ratio,  i.  .as  given 
by  Equation  36. 
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Figure  •>$.  .'liiiimum  main  vortex  strength  required  to  induce  a  counter-rotacinq  vortex  at  the  viscous 
wall  as  a  function  ot  aspect  ratio,  from  Equation  37. 
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B  EFFECT  OF  FREE  SURFACE 

To  make  the  numerical  problem  more  tractable,  we  previously  assumed  that  the  free  surface  was 
constrained  by  a  normal  force  to  a  flat  slippery  surface.  This  allowed  us  to  remove  one  variable 
(free  surface  position)  from  the  problem,  but  at  the  price  of  violating  the  =  0  (no  normal 
stress)  boundary  condition.  This  assumption  seems  to  be  well-justified  by  experiment;  however,  at 
this  point,  we  will  test  the  consistency  of  this  assumption  with  the  calculated  solution.  We  do  this 
during  the  flow  evolution  by  periodically  calculating  what  the  shape  of  the  free  surface  would  be 
if  we  were  to  let  it  deform  to  satisfy  the  above  no-normal-stress  boundary  condition.  The  shapes 
we  obtain  are  not  fed  back  into  the  flow  evolution  calculation;  rather,  if  the  shapes  we  find  are 
always  negligibly  different  from  a  flat  surface,  then  we  will  have  justified  the  flat-surface  assumption 
throughout  the  computation. 

First  we  calculate  the  pressure  throughout  the  flow  from: 

V2P  =  2(W,Vy  -  wyvt)  (40) 

Boundary  Conditions:  P,  =  0  on  3ie/tand  dri3ht 

Py  =  0  on  dt0p(and  on  <9i,otiomif  slippery) 

Py  =  on  dhottomi if  viscous) 


This  determines  P  to  within  an  additive  constant  (which  may  be  time  dependent).  Then  we 
calculate  the  normal  stress  on  all  boundaries  from: 


Tij  =  ~p$ij  +  —(Uij+  Uj'i) 


At  this  point,  we  choose  the  additive  constant  for  P  such  that 


Tijdrij  =  0 


(41) 


(42) 


This  condition  is  necessary  to  satisfy  continuity;  rationale  will  be  given  shortly.  Note  that  our 
previous  slippery  boundary  condition  on  the  flat  plane  c?«op( ie,  w  =  0  coupled  with  h  =  j)  ensures  that 
there  is  no  shear  stress  on  dtop-  We  now  relieve  the  normal  stress  on  <9lop through  two  mechanisms. 

A  first  mechanism  for  relieving  normal  stress  is  to  remove  an  appropriate  mass  of  fluid  from 
positions  with  downward  normal  stress  and  to  add  an  appropriate  mass  of  fluid  to  positions  with 
upward  normal  stress  such  that  the  weight  of  the  fluid  removed/added  balances  the  excess/deficient 
normal  stress.  Therefore: 

R 

t  =  —  &hg  cos  9  (43) 

ftp 

where  A h  is  the  nondimensional  position  of  the  free  surface  relative  to  the  constrained  surface,  g 
is  the  nondimensional  acceleration  due  to  gravity,  and  9  is  the  angle  of  d&0,(om(*e.  the  xaxis)  from 
horizontal.  This  angle  is  the  source  of  fluid  motion  in  the  x  direction;  the  component  of  gravity  in 
the  x  direction  is  the  body  force  that  works  to  maintain  the  parabolic  profile.  We  previously  found 
this  body  force  to  be  -A-  (Equation  3),  so  we  find  that: 


_  2 
^  ~  Ru  sin  9 

We  can  now  solve  for  our  nondimensional  change  in  free  surface  height: 

.  ,  tRv~  tan 9 

=  2* 


(44) 
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This  is  a  first  approximation  to  the  curvature  of  the  free  surface.  We  can  pause  here  and  make 
a  crude  order  estimate  as  to  its  magnitude.  Substituting  Equations  28  and  27  into  Equation  41, 
Equation  41  into  Equation  44,  and  evaluating  at  dtopia  the  neighborhood  of  a  ~  0(  1)  yields  A hmo*  ~ 

D 

.8-^.  So  within  our  typical  range  of  interest  we  have  A A  ~  <3(.001).  This  crude  estimate  indicates 
that  curvature  effects  at  the  free  surface  are  negligible. 

Equation  44  is  a  first  approximation  to  the  position  of  the  free  surface,  yet  there  is  another  effect 
that  should  be  considered.  By  adding  and  subtracting  fluid  at  the  upper  surface,  we  have  necessarily 
stretched  the  streamlines  in  some  locations  and  compressed  them  in  others,  thus  changing  the 
velocities  and  ultimately  both  the  pressure  and  the  stress.  This  will  further  change  the  free  surface 
height.  We  can  account  for  this  effect  by  appropriately  perturbing  our  first  approximation.  The 
calculation  begins  by  finding  the  point  on  dtop  where: 


r 


-p+f 


=  0 


(from  Equation  41);  in  other  words,  we  first  find  the  point  on  the  free  surface  where  there  is  no 
deformation.  The  existence  of  at  least  one  such  point  is  guaranteed  by  applying  the  meaD  value 
theorem  to  our  condition  used  to  find  the  additive  constant  for  the  pressure  (Equation  42)  The 
normal  stress  at  a  nearby  point  on  dtopis : 


.  2t/' 


=  — Pq  -  Py  Ah  -  P,  Az  + 


2PAA 


Rv  P„2tan0 


(45) 


where  P0  is  the  pressure  at  our  starting  point,  tildes  denote  derivatives  to  be  evaluated  by  centered 
differences  between  the  previous  and  current  points,  and  primes  denote  quantities  evaluated  after  the 
free  surface  has  been  perturbed  to  relieve  the  normal  stress  due  to  streamline  stretching  Note  that 
by  construction,  we  would  have  rn  =  0  at  all  points  on  d(opif  we  were  to  omit  the  primes  (ie,  neglect 
the  stretching  of  streamlines).  For  small  variation  in  the  free  surface  height  (our  original  conjecture, 
to  be  verified  through  consistency),  we  can  accurately  assume  the  distance  between  streamlines  to 
be  modified  by  a  factor  therefore  the  spanwise  velocity  w  will  be  modified  by  75^  and  the 

transverse  velocity  v  will  be  similarly  modified  through  continuity.  Spatial  derivatives  are  unaffected. 
Equation  45  can  now  be  expressed  in  terms  of  quantities  that  are  more  easily  computed. 


rn  =  -  P0  +  A  A( 


2R 


Rv~  tan  9 


+ - - — (^  -  A zlww,  +  &))  + 

1+A  hK  Ru  ' *  1  +  AAV  Ru’ 


+ 


1 


(1  +  A/ip 


(Az  wwz ) 


(46) 


In  deriving  this  equation,  we  have  neglected  vertical  velocities  introduced  by  free  surface  curva¬ 
ture.  We  find  our  final  free  surface  height  by  setting  rn  =  0  in  the  above  equation  and  iteratively 
solving  for  AA.  Then  using  the  newly  calculated  point,  we  repeat  for  the  following  point,  and  so  on 
for  all  points  on  dtop 

We  calculate  stresses  preliminary  to  calculating  the  free  surface  position,  and  it  is  interesting  to 
note  in  passing  the  total  stress  and  torque  acting  on  the  box  of  fluid.  Figure  39  shows  the  total 
stress  acting  in  the  ;  (spanwise)  direction.  Initially,  the  vortex  applies  a  large  jerking  force  to  the 
right  as  the  vortex-induced  flow  pushes  on  the  viscous  wall,  then  the  force  changes  direction  as  the 
vortex  applies  increased  stress  against  the  left  wall,  and  the  net  effect  is  then  a  small  and  decaying 


force  to  the  left.  The  stress  in  the  y  direction  is  a  tiny  downward  force,  but  this  is  practically  zero. 
The  net  torque  on  the  box  grows  quickly  in  the  counterclockwise  direction,  then  decays  (Figure  40). 
In  order  to  hold  the  box  steady,  equal  and  opposite  forces  and  torques  would  have  to  be  applied  to 
the  box,  ie,  a  large  leftward  force  as  the  box  jerks  to  the  right  and  then  a  small  decaying  force  to 
the  left,  along  with  a  decaying  clockwise  torque.  Of  course,  when  considering  an  array  of  vortices, 
the  torques  and  horizontal  forces  are  all  balanced  by  those  of  the  images;  the  very  small  downward 
force  is  the  only  resultant  force. 

Figure  41  shows  the  height  of  the  free  surface  in  a  box  with  an  aspect  ratio  of  one,  and  Figure  42 
shows  the  free  surface  in  a  box  of  aspect  ratio  equal  to  eight.  In  all  computations  at  R„  within  our 
range  of  interest  (<  15),  the  contribution  due  to  streamline  stretching  is  totally  insignificant  and 
the  gravity  effect  dominates.  We  retain  both  in  the  computation,  but  the  gravity  effect  expression 
for  the  free  surface  height  is  accurate  to  a  high  degree  of  accuracy.  Therefore  Ah  is  proportional 
to  r  (Equation  44).  To  preserve  continuity  we  need  f3.  Ahdz  =  0,  and  so  this  is  the  reason  (as 

3  ^ 

previously  promised  at  Equation  42)  that  we  used  the  condition  rn(z,  1  )dz  =  0  to  choose  the 
additive  constant  for  the  pressure.  The  error  in  not  including  the  streamline  stretching  effect  in  the 
additive  constant  (via  iteration)  is  negligible. 

The  free  surface  in  Figure  41  is  similar  to  a  sine  curve  slightly  displaced  from  center.  This 
is  consistent  with  the  analytic  solution  found  previously;  substituting  Equation  28  (at  dtop)  into 
Equation  41,  Equation  41  into  Equation  44,  and  using  Equation  27  to  evaluate  t/y,  we  have: 


Ah  = 


AR, 


.2taa0,  2 ir 


2  R 


d  + 


2zr 

akRv 


sin 


sinhf 


„  kt 

)sm^-exp-  — 


(47) 


We  have  used  just  the  zero’th  order  terms  here;  they  should  suffice  since  the  vortex  diffuses  to  all 
four  boundaries  relatively  quickly  in  a  square  box.  So  the  perturbation  solution  predicts  exactly 
this  centered  sinusoid  for  the  free  surface  shape.  This  is  slightly  different  from  the  shape  of  the  free 
surface  derived  from  Suri’s  expression  (Equations  10  and  11)  for  the  pressure  distribution  above  a 
slippery  bottom: 


Ah  = 


R„ 2  t3  tan  9  tt  , 


1 


2(l  +  ^)T2t 

R„ 


—  sin^fexp - 7f- - ) 


.1+ 

Rv 


(48) 


Note  that  his  pressure  distribution  is  symmetric  rather  than  asymmetric.  It  also  decays  twice  as  fast 
as  the  convective  stress  term.  Both  expressions  for  the  free  surface  height  (Equations  47  and  48.)  share 
the  same  shape  as  a  — f  0  and  t  becomes  large  (for  both  expressions:  Ah  ~  tan  0sin  exp  ~  ft')- 

The  free  surface  in  Figure  42  is  derived  from  a  box  of  large  aspect  ratio.  The  dip  centered  above 
the  vortex  at  z  =  4  is  due  mainly  to  the  low  pressure  caused  by  the  relatively  high  velocities  at  the 
core  edge,  and  is  aided  by  the  downward  normal  stress  from  the  convective  term.  The  raised  lip  to 
the  right  of  this  low  pressure  dip  is  totally  due  to  the  upward  normal  stress  from  the  convective  term. 
We  have  essentially  Stokes  flow  in  the  two  extreme  ends  of  the  box.  therefore  the  free  surface  height 
is  governed  by  the  pressure.  In  this  case,  pressure  is  high  on  the  left,  raising  the  surface,  and  low  on 
the  right,  lowering  it.  One  crude  way  of  understanding  this  pressure  behavior  is  to  consider  a  two- 
compartment  box  (Figure  43),  with  the  vortex  at  the  center  acting  to  pump  fluid  from  right  to  left 
at  the  top,  and  from  left  to  right  at  the  bottom.  The  viscous  wall  impedes  the  flow  at  the  bottom  by 
exerting  a  force  to  the  left.  At  the  top  there  is  no  such  restriction,  and  the  vortex  continues  to  pump 
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fluid  to  the  left  unimpeded.  This  pressurizes  the  left  compartment  and  depressurizes  the  right.  As 
the  vortex  has  not  yet  reached  the  side  walls,  we  would  have  to  take  more  terms  of  the  perturbation 
solution  in  order  to  derive  an  acceptable  analytic  expression  to  describe  the  figure,  though  one  might 
convince  oneself  that  the  region  above  the  core  is  qualitatively  described  by  Equation  47. 

Figure  44  shows  the  free  surface  above  a  very  strong  vortex  {R„  =  25).  Here,  the  shape  of  the  free 
surface  is  totally  dominated  by  the  very  high  velocities,  and  thus  very  low  pressures,  inside  the  core. 
The  shape  is  almost  a  symmetric  cosine,  centered  over  the  core.  This  is  consistent  with  Equation  48 
(which  should  also  be  the  form  of  Equation  47  when  higher  order  terms  are  included);  the  high  /2„in 
the  denominator  of  the  asymmetric  stress  term  makes  it  negligible,  leaving  the  symmetric  part  of 
the  pressure.  In  this  case,  the  stress  distribution  above  the  viscous  wall  is  actually  well-described 
by  the  solution  above  a  slippery  wall,  because  the  very  great  strength  of  the  vortex  renders  viscous 
wall  effects  relatively  unimportant.  As  circulation  decreases,  the  slippery  wall  solution  becomes 
increasingly  less  applicable. 

Our  ultimate  purpose  was  to  see  whether  or  not  the  deformation  of  the  free  surface  was  a  large 
effect.  We  can  see  now  that  it  is  not.  At  R„ within  our  range  of  interest,  the  maximum  deformation 
is  less  than  .3%.  This  is  why  free  surface  deformation  in  laminar  flow  has  not  been  detected  in  the 
course  of  Yang’s  or  Suri’s  experiments  -  they  estimate  their  observational  threshold  to  be  as  5%. 
Therefore  they  may  be  able  to  detect  deformations  due  to  vortices  of  strength  /£„  ~  50  (providing 
the  flow  remained  laminar),  but  not  much  weaker.  If  these  deformations  later  come  within  our 
observational  grasp,  the  above  work  predicts  that  increases  in  height  will  be  detected  above  the 
downflow  region  (high  speed  streak),  and  decreases  in  height  above  the  upflow  region  (low  speed 

streak).  This  small  value  of  the  deformation  is  the  result  of  the  factor  in  Equation  47.  In 
our  range  of  interest  for  observed  laminar  flows  (R„  ~  5,  R  ~  2500),  the  deformation  is  then 
Ah  ~  C?(^g). 

We  conclude  that  the  deviation  of  the  free  surface  from  a  slippery  flat  plane  is  indeed  negligible 
for  values  of  f2„and  R within  our  range  of  interest,  and  that  our  previous  assumption  of  such  a 
slippery  flat  plane  for  dt0pia  well  justified. 
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Figure  39:  Horizontal  stress  on  a  vortex  ceil,  a  =s  4,  due  to  an  initially  potential  vortex  above  a 
plane  viscous  wall. 


Figure  40:  Torque  about  center  of  vortex  cell  due  to  normal  forces  on  Jtop-  torque  due  to  normal 
and  tangential  forces  on  dhoitom .  and  the  total  torque  due  ro  an  initially  potential  vortex  above  a 
plane  viscous  wail.  Positive  torque  is  counterclockwise. 


Figure  41:  Free  surface  height  at  successive  times  due  to  an  initially  potential  vortex  above  a  plane 
viscous  wall,  a  =  1,  Rv  =  5. 
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Figure  42:  Free  surface  height  over  time  lue  to  an  initially  potential  vortex  above  a  plane  viscous 
wall,  a  =  3,  Ru  =5. 
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Figure  43:  Development  of  high  and  low  pressure  regions  in  the  Stokes  flow  regions  near  a  vortex 
above  a  viscous  wall. 


Figure  44:  Free  surface  height  at  successive  times  iue  r o  an  initially  potential  vortex  above  a  plane 
viscous  wall,  u  =  4.  /?„  =  25. 


C  COMPARISON  WITH  SEMI-INFINITE  DOMAIN 


Previous  computations  have  all  involved  flow  inside  a  box  of  some  finite  aspect  ratio.  When  studying 
the  induced  vortex,  it  has  often  seemed  that  the  side  walls  played  some  part  in  its  development, 
particularly  at  small  f2„  (eg,  Figures  12  and  13).  It  is  now  desirable  to  remove  the  side  walls  and 
consider  the  flow  in  a  semi-infinite  domain  (bounded  in  y,  unbounded  in  both  x  and  z)  which  is  more 
representative  of  Yang’s  water  table  flow.  Specifically,  we  want  to  see  if  our  previous  conclusions 
about  the  induced  vortex  (existence,  strength,  effect  on  the  streamwise  flowfieid)  still  hold. 

To  do  this,  we  map  the  unbounded  coordinate  z  €  (-00,00)  onto  i  £  (— i,  via  the  mapping 

£  =  4  tan-1  z.  Equations  1  to  3  then  transform  as: 

Crossflow: 


w,  +  UUI,  +  UIU(  *4*  =  tJ-K,  +  salii  _ 

ily 

XV  =  i>y 

»  =  -*1^ 


Streamwise: 


«,  +  »«,  =  i-  + 

1L  ly  it  V 


(49) 

(50) 

(51) 

(52) 

(53) 


Boundary  conditions  and  initial  conditions  are  unchanged 

At  an  initial  vortex  strength  of  Rv  =  5,  we  see  that  a  major  induced  vortex  appears  on  the 
right  and  a  minor  one  appears  on  the  left  (Figure  45).  (Note  that  machine  resolution  stops  short 
of  ±00).  Now  the  major  induced  vortex  initially  forms  very  far  to  the  right  (as  close  to  z  =  00  as 
machine  resolution  allows),  the  center  moves  to  within  a  distance  of  two  flow-depths  of  the  main 
vortex,  and  then  rises  (Figure  46).  However,  the  circulation  of  this  induced  vortex  is  still  negligible 
-  three  orders  of  magnitude  smaller  than  that  of  the  main  vortex  (Figure  47).  Despite  its  presence, 
vorticity  contours  remain  almost  symmetric  in  c  (Figure  48).  It  has  no  noticeable  impact  on  the 
streamwise  velocity  field  (Figure  49). 

At  stronger  R„  (eg,  Ru  —  15)  the  major  induced  vortex  is  stronger,  but  remains  two  orders  ot 
magnitude  weaker  than  the  main  vortex  (Figure  50).  Vorticity  contours  are  more  asymmetric  due 
to  increased  convection  of  negative  vorticity  generated  at  the  viscous  wall  (Figure  51).  The  effect  of 
the  induced  vortex  on  the  streamwise  velocity  field  is  still  negligible  (Figure  52).  At  this  higher  /?„, 
it  is  interesting  to  note  the  highly  inflectional  profiles  produced  in  the  vortex  core.  This  rotation  of 
the  shear  angle  by  a  little  more  than  180°  is  as  predicted  by  Pearson  for  this  R„. 
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Figure  45:  Development  of  the  main  and  induced  vortex  streamlines  over  time,  Rv=  5.  The  vortex 
cell  is  unbounded  in  rand  is  mapped  onto  a  finite  domain.  Smaller  contour  intervals  are  used  for 
the  induced  vortex. 


Figure  46:  Motion  of  the  predominate  induced  vortex  over  rime.  In  'he  -arlv  stages,  tangential 
velocities  are  very  weak  and  the  vortex  center  is  not  well  defined.  In  ■  ms  plot,  the  unbounded 
coordinate  c  has  been  truncated. 
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Figure  47:  Circulations  of  the  main  vortex  (left  graph)  and  right  induced  vortex  (right  graph)  of 
Figure  45.  The  circulation  of  the  induced  vortex  has  a  small  ‘noisy’  component  because  of  uncertainty 
in  the  location  of  the  separating  streamline  between  it  and  the  main  vortex. 


Figure  48:  Vorticity  contours  for  the  vortices  of  Figure  45.  The  initial  condition  is  a  delta  function 
of  vorticity. 


Figure  49:  Streamwise  velocity  contours  for  the  vortices  of  Figure  45.  Note  that  the  induced  vortex 
causes  no  noticeable  perturbation  (;  as  1.8). 
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Abstract 

While  type  theories  such  as  Nuprl  are  expressive  logics  for  theorem 
proving,  they  present  difficulties  for  designers  of  term  rewriting  systems. 
The  two  most  serious  difficulties  are:  1)  Nuprl  does  not  provide  a  global 
equality.  Instead  users  rewrite  over  arbitrary  user-defined  relations.  2) 
Each  rewrite  step  must  be  proved  valid.  In  general,  these  proofs  cannot 
be  recursively  generated. 

We  have  overcome  these  difficulties  and  designed  a  package  that  works 
well  in  practice.  Our  solution  is  an  extensible  system  for  directing  and 
validating  relational  inferences.  The  heart  of  our  package  is  a  set  of  opera¬ 
tors  that  use  a  user-supplied  lemma  database  to  create  new  rewrites  from 
old  ones.  These  routines  place  no  restrictions  on  relations;  a  rewrite’s  suc¬ 
cess  depends  on  the  strength  of  the  database.  Overall,  the  package  allows 
rewrites  to  be  pieced  together  in  numerous  ways,  providing  the  user  with 
a  tool  to- construct  sophisticated  rewrite  strategies. 


1  Introduction 

Our  research  addresses  rewriting  in  Nuprl,  a  sequent  calculus  formulation  of 
a  constructive  type  theory  similar  to  Martin-Ldfs[13j.  While  most  systems 
assume  that  their  rewrite  functions  are  reliable,  in  Nuprl,  every  rewrite  must  be 
proven  correct. .  This  is  difficult  as  Nuprl’s  expressive  power  yields  undecidable 
typing  problems:  There  is  no  effective  procedure  that  determines  if  a  term 
belongs  to  a  given  type,  or  determines  the  type  of  a  given  term.  As  a  result  it 
is  undecidable  whether  even  simple  rewrites  are  valid.  For  example,  each  type 
comes  with  its  own  equality  and  a  proof  of  t  =t  tn  necessitates  a  proof  that  both 
t  and  t'  are  members  of  T.  Furthermore,  the  validity  of  standard  congruence 
(substitutability)  reasoning  must  be  proven.  If  t  =t  t'  then  to  substitute  V 

'This  work  was  supported,  in  part,  by  an  IBM  Fellowship. 

1  We  shall  follow  the  following  syntactic  conventions:  T,A,B,...  shall  represent  types;  Q ,  R, 
and  5  relations;  t,  r,s, ...  terms  and  x,  y,  z , ...  variables. 
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for  t  in  B[t]  one  must  show  that  for  every  x  and  y  equal  in  T,  B[x]  and  £[y] 
are  equal  types.  This  too  is  undecidable.  We  explore  these  problems  and  their 
implications  for  rewriting  in  Section  2. 

Despite  these  difficulties,  we  have  implemented  a  rewrite  package  that  works 
well  in  practice.  Our  approach  provides  operators  that  construct  relational 
conversions.  Given  a  seouent  p  and  a  term  f,  a  conversion  yields  a  triple:  a 
relation  R,  a  term  t' ,  and  an  ML  program  called  a  tactic ,  which  should  prove 
tut'  under  the  assumptions  in  p.  As  with  tactics,  a  conversion  may  fail;  but  if 
it  succeeds,  then  t  Rt‘  may  be  used  as  an  assumption. 

Basic  conversions  are  generally  constructed  from  lemmas  of  the  appropriate 
form  (which  is  roughly  that  of  a  universally  quantified  chain  of  implications  that 
ends  in  a  relation  t  R  t').  We  provide  an  operator  that  takes  a  lemma  and  a 
relation  and  constructs  a  conversion  which,  if  it  succeeds,  rewrites  an  instance 
of  the  left-hand  side  of  the  relation  in  the  lemma  into  a  corresponding  instance 
of  the  right  hand  side.  Conversions  themselves  are  data- values,  and  higher  order 
combinatora  provide  a  means  to  compose  them  and  form  new  conversions.  For 
example,  THENC  is  a  combinator  that  sequentially  composes  two  conversions 
and  SubConv  applies  a  conversion  to  the  immediate  subterms  of  a  term.  A 
user-supplied  lemma  database  is  used  in  the  construction  of  tactics  that  prove 
these  rewrites  valid.  SnbConv,  for  example,  uses  this  database  to  produce  a 
tactic  that  generates  congruence  proofs.  No  restrictions  are  placed  on  relations; 
they  need  not  even  be  equivalence  relations.  However,  whether  a  conversion 
succeeds  depends  on  the  strength  of  the  database.  The  overall  package  is  highly 
modular;  conversions  can  be  put  together  in  numerous  ways,  allowing  the  user  to 
construct  sophisticated  rewrite  strategies.  Section  3  details  our  implementation 
and  illustrates  the  power  of  our  approach  with  examples. 

Our  package  is  similar  to  Paulson’s(14)3.  Both  are  collections  of  ML  pro¬ 
grams  that  can  be  applied  in  a  modular  higher-order  style  and  many  of  the 
combinators  are  functionally  identical.  However,  our  package  differs  in  two 
important  respects.  First,  Paulson’s  package  allows  rewriting  only  over  two  re¬ 
lations:  term  equality  and  formula  equivalence.  These  are  special  cases  of  our 
general  relational  approach.  Second,  he  provides  separate  notions  of  conver¬ 
sions  (cone  and  fconv)  for  each  relation,  each  with  its  own  fixed  strategy  for 
proving  rewrites  valid.  Our  implementation  allows  the  user  to  extend  relational 
inference  simply  by  expanding  the  lemma  database.  Of  course,  such  extensibil¬ 
ity  is  necessary  in  light  of  undecidable  typing,  but  we  have  found  the  resulting 
generality  useful  in  practice. 

Our  package  is  currently  used  by  the  author  and  other  researchers  at  Cor¬ 
nell  working-  in  hardware  verification.  We  have  successfully  constructed  a  vari¬ 
ety  of  rewrite  procedures  including  a  predicate  calculus  simplifier  and  associa¬ 
tive/commutative  term  normalizere.  We  have  also  explored  rewriting  in  con- 

JIt  also  bears  similarities  to  ideas  of  Howe  and  Steeg[9j  who  implemented  rewriting  in 
Nupri  over  the  (type  parametrized)  equality  relation- 
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structive  set  theory.  In  this  domain,  set  equality  is  a  user  defined  predicate. 
Congruence  reasoning  requires  proofs  that  each  set  theoretic  operator  respects 
this  predicate,  and  these  proofs  must  be  composed  to  justify  each  rewrite.  With¬ 
out  a  package  such  as  ours,  the  burden  of  constructing  such  justifications  by 
hand  would  be  unmanageable. 

Our  approach  to  rewriting  has  applications  outside  of  Nuprl  and  the  related 
theories  of  Martin-  Lof.  Logics  with  decidable  typing  but  weak  equality,  such 
as  the  Calculus  of  Constructions^],  also  require  that  users  define  their  own 
equalities  and  prove  their  rewrites  valid.  We  believe  that  as  such  logics  become 
more  popular,  extensible  rewrite  systems  such  as  ours  will  become  integrated 
into  their  use. 


2  Nuprl 

In  this  section,  we  highlight  those  aspects  of  Nuprl  relevant  to  rewriting.  The 
interested  reader  may  consult  [3]  for  a  more  complete  account. 

The  basic  objects  of  reasoning  in  Nuprl  are  types  and  members  of  types. 
The  rules  of  Nuprl  deal  with  sequents,  objects  of  the  form 

xi'.H,  xy.Hi,  ...,  xn:Hn  »  G. 

To  judge  a  sequent  true  essentially  means  that  when  given  members  x <  of  Ht, 
one  can  construct  an  inhabitant  of  the  goal  G.  Nuprl ’s  rules  are  applied  in  a 
top-down  fashion.  That  is,  they  allow  us  to  refine  a  goal,  such  that  a  proof 
of  the  goal  may  be  constructed  from  proofs  of  the  subgoals.  Nuprl  provides 
two  kinds  of  inference  rules:  primitive  rules  and  ML  programs  called  tactics. 
Nuprl  tactics  are  similar  to  those  in  LCF[6]:  Given  a  sequent  as  input,  they 
apply  primitive  inference  rules  and  other  tactics  to  the  sequent.  Tactics  serve 
as  derived  inference  rules;  their  correctness  is  justified  by  the  way  the  type 
structure  of  ML  is  used. 

Nuprl’s  type  theory  is  expressive;  its  intent  is  to  facilitate  the  formalization 
of  constructive  mathematics.  Types  are  stratified  into  a  cumulative  hierarchy  of 
universes.  A  term  is  a  well-formed  type  if  and  only  if  it  inhabits  some  universe 
Ui.  Type  constructors  include  dependent  function  space,  dependent  product, 
disjoint  union,  equality,  set  type,  and  quotient  type.  Each  type  comes  with  its 
own  equality  relation.  The  equality  type  is  a  three  place  relation  1 1  =t  U  that 
is  inhabited  exactly  when  ft  and  tj  are  equal  members  of  T.  Propositions  are 
represented  via  the  propositions-as-types  correspondence:  A  proposition  is  true 
if  and  only  if  the  type  associated  with  it  is  inhabited. 

A  special  case  of  type  checking  in  Nuprl  is  deciding  if  a  program  meets  its 
specification.  As  a  result,  a  number  of  typing  problems  are  undecidabie.  These 
include:  type  membership  (<  €  T);  type  inference;  and  type  well-formedness 
(T  €  Ui).  The  first  two  problems  imply  that  there  is  no  uniform  way  of  proving 
the  identity  rewrite  valid.  Given  a  term  t ,  we  cannot  infer  a  type  T  it  inhabits; 
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even  if  we  are  given  a  T,  we  cannot  prove  t  =r  f  as  that  requires  a  proof  of  t  €  T. 
Type  well-formedness  is  a  semantic  property  aa  there  are  syntactically  legitimate 
yet  meaningless  terms.  As  we  shall  see  in  the  next  section,  proofs  of  certain  well- 
formedness  goals  may  be  thought  of  as  proofs  of  relational  congruence. 


3  Generalized  Rewriting 

Rewriting  is  the  process  of  finding  a  term  t'  that  is  somehow  simpler  than  a 
given  term  t.  Moreover,  the  terms  must  stand  in  some  relation.  Our  rewrite 
functions  are  called  conversions.  Given  a  sequent  p  and  a  term  t ,  a  conversion 
returns  a  rewrite  triple:  a  relation  R,  a  term  t',  and  a  tactic  tac  that  proves  the 
assertion  t  R  t‘  under  the  assumptions  in  p.  We  call  R  the  rewrite  relation.  We 
take  an  abstract  view  of  relations.  A  relation  is  specified  by  a  constructor  that 
maps  sequents  and  pairs  of  terms  to  terms,  and  a  destructor  that  breaks  down 
terms  into  pairs.  Aside  from  the  ability  to  construct  and  destruct  relations,  no 
other  properties  are  assumed.  Conversions  are  applied  with  Nuprl’s  cut  rule:  If 
c  is  a  conversion,  and  c(p)(t)  returns  the  triple  <R,  t‘,  tac>,  then  t  Rt'  may  be 
proved  with  tae  and  used  as  an  assumption. 

In  the  remainder  of  this  section  we  describe  how  conversions  are  constructed 
and  composed.  Our  approach  provides  operators  that  construct  conversions 
using  primitive  inference  rules  and  lemmas  and  provides  com bina tors  that  build 
conversions  from  simpler  ones.  This  modular  higher-order  approach  to  rewriting 
originated  with  Paulson  who  provides  an  account  in  [14].  Rather  than  duplicate 
Paulson’s  account,  we  shall  instead  focus  on  what  is  novel  about  our  approach: 
how  we  coordinate  rewrites  over  arbitrary  relations  and  our  use  of  lemmas  to 
direct  inference  and  validation  in  a  theory  with  undecidable  typing  problems. 

3.1  Basic  Conversions 

Basic  conversions  are  of  two  types:  primitive  and  lemma-based.  Primitive  con¬ 
versions,  such  as  the  identity  conversion  IdConv  and  (values  of)  IdConvWithR. 
prove  their  rewrites  valid  with  primitive  inference  rules.  IdConvWithR  takes 
a  relation  72  as  an  argument  and  returns  the  conversion  \p.Xt.<R,t,tac>.  A 
Nupri  library,  an  ordered  collection  of  definitions,  theorems,  and  other  objects, 
is  used  to  construct  a  database  of  lemmas  expressing  relational  properties.  For 
the  identity  rewrite,  IdConvV  'hR  searches  this  database  for  a  lemma  of  the 
form 

» t  Rt. 

If  no  such  lemma  is  found,  the  relation  is  assumed  irreflexive  and  the  conversion 
fails.  Otherwise  the  conversion  succeeds  and  tac  proves  t  R  t  with  the  found 
lemma.  IdConv  is  similar  to  IdConvWithR.  But  instead  of  taking  a  relational 
argument,  type  inference  routines  described  in  [7]  infer  a  T  for  the  relation  =7>. 
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The  conversion  fails  when  T  cannot  be  inferred,  but  this  has  not  been  a  problem 
in  our  experience. 

Other  primitive  conversions  include  FailConv,  the  always  failing  conversion, 
ReduceConv,  which  replaces  redices  by  their  contracts,  and  Simplify  which  per¬ 
forms  arithmetic  simplification. 

Lemma-driven  conversions  construct  rewrite  triples  from  lemmas  in  a  user’s 
library.  These  lemmas  take  the  form 

Vij  •  Ti.'ix 2  :  Ij. ... Vzn  :  Tn.A\  ^  Aj  ^  ...  ^  Am  =>  s  Rs'. 

The  function  LemmaToConv  takes  two  arguments,  a  lemma  1,  in  which  m,  the 
number  of  assumptions,  is  zero,  and  a  relation  R,  and  returns  a  conversion 
\p.\t.<R,t?,tac>.  LemmaToConv  matches  t  against  the  left  operand  of  the 
relation  s  R  s'3  in  /.  If  the  match  fails,  the  conversion  fails.  Otherwise,  the 
variables  xi,  are  bound  by  the  match  and  used  to  instantiate  s'  which  is 
returned  as  tf.  When  executed  the  tactic  tae  applies  /  to  prove  t  R  t'. 

Conditional  rewrites  are  constructed  similarly.  ImpLcmmaToConv  takes  as 
inputs  a  lemma  /,  which  may  have  assumptions,  a  relation  R,  and  a  tactic. 
Before  constructing  its  rewrite  triple.  Imp  Lemma  ToConv  applies  the  input  tactic 
to  the  instantiated  assumptions  A\, ...,  Am  and  fails  if  this  application  fails.  Tac 
appropriately  combines  both  the  lemma  l  and  the  input  tactic. 

Both  LemmaToConv  and  Imp  LemmaToConv  are  driven  by  first  order  match¬ 
ing.  We  have  also  implemented  powerful  versions  that  are  driven  by  second  order 
matching.  As  second  order  matches  are  not  necessarily  unique,  these  functions 
take  an  additional  argument,  a  match  discriminator  function  that  chooses  an 
appropriate  match  from  a  set  of  matches. 

3.2  Composing  Conversions 

Given  functions  that  construct  basic  conversions,  the  next  step  is  to  provide 
operators  that  combine  them.  The  two  basic  combinators  are  ORELSEC  and 
THENC.  The  former  provides  selective  composition  and  the  latter  a  method  of 
sequentially  composing  conversions. 

ORELSEC  is  baaed  on  ML  failure.  In  ML,  the  expression  ei  ?  ej  computes 
the  value  of  e\  and  if  that  fails  it  computes  the  value  of  «j.  Thus,  we  define 
(ci  ORELSEC  cj)(p)(t)  as 


Cl(p)(0?«j(P)(0- 

TEENC  uses  a  generalized  kind  of  transitivity  reasoning  to  sequentially  com¬ 
pose  conversions.  Unlike  selective  composition,  this  requires  use  of  the  lemma 

3 There  are  function*  Rttttnmi ToConw  and  RerlmpLemmmToCont  which  match  againat 
the  right  operand. 
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database.  Given  conversions  ci  and  cj,  (ct  THENC  c2)(p)(t)  computes  the 
triples 

<Ruti,taei>  s  ei(p)(t)  and 
<Ri,t2,  tae 2>  =  c2(p)(t!). 

If  either  conversion  fails,  then  THENC  fails.  Otherwise,  THENC  uses  R\  and 
i?j  as  keys  and  searches  the  database  for  a  sequencing  lemma  of  the  form 

t  Ri  s,  s  Rj  r  »  t  R3  r. 

If  such  a  lemma  is  found,  the  triple  <£3,  <j,  tae>  is  returned,  where  taci,  tacj, 
and  the  lemma  are  appropriately  combined  into  tac. 

There  is  no  need  for  the  relations  to  be  identical.  For  example, 

t  <  a,  a  <  r  »  t  <  r 

is  a  valid  sequencing  lemma.  Our  implementation  insists  that  there  is  at  most 
one  sequencing  lemma  for  any  pair  of  relations;  more  general  approaches  al¬ 
lowing  multiple  sequencing  lemmas  are  possible.  Such  approaches  would  be 
implemented  analogously  to  the  congruence  reasoning  routines  described  in  Sec¬ 
tion  3.3.2. 

ORELSEC  and  THENC,  along  with  FailConv  and  IdConv,  provide  the 
basis  for  multi-way  choice  and  repetition.  The  operator  FirstC  returns  the  first 
successful  conversion  from  its  argument  list  [ci;  ...;c„].  It  is  equivalent  to 

d  ORELSEC  ...  ORELSEC  c„ 

and  is  defined  recursively  in  terms  of  ORELSEC  and  FailConv.  RcpcatC{c) 
repeatedly  applies  the  conversion  c  until  failure.  It  is  defined  recursively  as 

(c  THENC  (RepeatC(c)))  ORELSEC  IdConv. 

With  FirstC  and  RepeatC  it  is  easy  to  construct  a  general  chaining  procedure 
where  relational  inferences  are  combined  to  the  extent  justifications  are  provided 

in  the  database. 

3.3  Congruence  Conversions 

3.3.1  Congruence  Proofs 

Subterm  rewriting  requires  the  construction  of  tactics  that  generate  congruence 
proofs.  These  proofs  can  be  subtle  and  in  practice  more  difficult  to  construct 
than  refiexivity  and  transitivity  proofs.  We  shall  first  examine  these  proofs  in 
the  simplest  possible  setting,  equality  congruence,  and  then  consider  general 
relational  congruence. 
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Given  t  =t  ?,  lo  conclude  r\t/x]  —r<  r[f'/z],  we  must  prove  that  equal 
members  i  in  T  yield  equal  members  r  in  V  (*  may  be  free  in  r),  i.e., 

»t=Tt'  z:T»r€T' 

»  r[t/z)  =t>  r[t'/ x] 

The  extra  premise,  that  r  is  functional  in  x ,  takes  the  form  of  a  membership 
goal  and,  as  discussed  in  Section  2,  is  in  general  undecidable. 

Congruence  proofs  are  by  induction  on  the  structure  of  Nuprl  terms.  A  term 
is  a  tree.  It  is  specified  by  an  n-ary  term  constructor  and  its  n  subterms.  If  the 
subterms  t,  of  a  term  T  are  rewritten  via  equality  to  fit  congruence  is  proven  by 
decomposing  T  into  its  immediate  subterms  and  proving  that  T  is  functional  in 
the  types  of  these  terms.  The  t<  are  then  either  proved  equal  to  by  reflexivity 
(equality),  proved  equal  by  assumption  (hypothesis),  or  recursively  decomposed. 
We  illustrate  this  with  the  following  example.  Given  r  =A  s,  we  prove 

»  /(»(»■))  ~C  f{g(a)) 


under  the  assumptions  f:B-*C ,  and  g:A-» 

B  as  follows. 

1 

...  »  f(g(r))  =c  /($(«)) 

if  intro  using  B  —*C 

1.1 

...  »  /  =B_c  / 

iy  equality 

1.2 

...  »j(r)=Bff(s) 

by  intro  using  A  — *  B 

1.2.1 

...  »  g  =x-.fl  g 

by  equality 

1.2.2 

...  »rxtA  a 

by  hypothesis 

In  the  above  example,  observe  that  each  step  in  the  congruence  proof  can 
be  viewed  as  justifying  a  rewrite  with  respect  to  some  (different)  equality  rela¬ 
tion.  As  we  “peel”  through  the  various  term  constructors,  our  rewrite  relation 
changes. 

This  approach  of  proving  functionality  by  recursive  decomposition  general¬ 
izes  to  congruence  proofs  for  arbitrary  relational  rewrites.  Suppose  T  is  a  term 
containing  n  >  1  subterms  t,  that  are  rewritten  with  respect  to  n  (possibly 
distinct)  relations.  Viewing  T  as  a  term  tree,  for  each  term  constructor  9  in 
the  path  from  each  t*  to  the  root  of  T  we  must  prove  that  6  is  functional  in 
the  relations  used  to  rewrite  its  immediate  subterms.  Formally,  0  is  said  to  be 
Ao- functional  in  the  relations  Ri, ...,  Rn  whenever 

. . a  (i) 

is  provable  under  the  assumptions 

<i  R\  t\, 

In  Rn 

Cl  (-} 
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The  Ci, ...,  Cm  are  additional  auxiliary  conditions  which  may  be  required  of  the 
t{.  We  call  the  lemma  that  proves  that  Equation  1  follows  from  the  assumptions 
in  2,  a  congruence  lemma.  A  congruence  proof  consists  of  recursively  decom¬ 
posing  a  term  by  applying  congruence  lemmas  until  the  relations  among  the 
resulting  subterms  follow  by  reflexivity  or  from  a  given  conversion. 

As  a  simple  example,  suppose  we  are  given  the  term  Varj :  T\.  ...Vr„  :T„.ti 
and  the  conversion  c  where  c(p)(ti)  =  <0,  t'v  tac\ >.  Then  n  applications  of  the 
congruence  lemma 

»  YT:Ui.'iP,Q:T  — *•  U\.  (Vr:T.  (P(x)  o  Q(x)) 

=>  (V*:T.P(*)oVi:T.Q(x)), 

which  proves  that  the  relation  V  is  functional  in  the  relation  o,  reduces 
proving 

V*x  :Tj.  ...Vxn :  T„.  tj  O  Vxj :  Ti.  ...Vzn:Tn.  fj 
to  proving  tt  o  tj,  which  is  proved  by  the  foci. 

3.3.2  Inferring  Rewrite  Relations 

In  our  implementation,  the  function  SubConv  provides  the  basis  for  subterm 
rewriting.  SubConv(c)  is  a  conversion  that  applies  the  conversion  c  to  the  im¬ 
mediate  subterms  of  a  term.  The  tactic  it  produces  justifies  the  subterm  rewrite. 
Repeated  application  of  SubConv  allows  rewriting  of  arbitrary  subterms. 

To  construct  a  rewrite  triple,  SubConv  use  the  lemma  database  as  a  source 
for  congruence  lemmas.  However,  there  may  be  more  than  one  choice  of  Rq 
for  a  given  9  and  R\, ...,  Rn.  For  example,  suppose  that  9  is  the  multiplication 
operator  *,  and  R\  and  Rj  are  =  and  <.  Then  with  an  additional  inequality 
condition,  the  rewrite  relation  Rq  may  be  either  <  or  That  is,  both 

a  =  c,  b  <  d,  a  >  0  »  a  *  b  <  c  *  d  (3) 

and 

a  =  c,  b  <  d,  a  >  0  »  a  *  b  ^  c*d  (4) 

are  valid  congruence  lemmas.  Moreover,  in  the  proper  proof  context,  each  of 
these  lemmas  could  find  application. 

How  one  determines  which  congruence  lemma  is  applicable  is  an  important 
question.  If  SnbConv  chooses  an  improper  rewrite  relation,  then  a  rewrite  may 
either  eventually  fail,  as  it  will  be  unable  to  create  a  tactic  that  constructs  a 
congruence  proof,  or  the  relation  chosen  may  be  too  weak,  rendering  the  rewrite 
useless.  Hence,  it  is  important  to  have  an  effective  strategy  for  selecting  rewrite 
relations.  In  this  section  we  outline  two  methods  for  controlling  relational  in¬ 
ference:  a  powerful  but  computationally  expensive  method,  and  a  simplified 
heuristic  method  that  works  well  in  practice.  Our  implementation  is  based  on 
the  latter. 
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The  most  general  method  of  subterm  rewriting  is  to  construct  all  possible 
rewrites  as  permitted  by  the  congruence  lemmas  contained  in  the  database. 
Such  a  method  necessitates  a  generalized  type  of  conversion  that  returns  not  a 
single  rewrite  triple,  but  rather  a  set  of  triples.  In  such  a  setting,  if  a  term  t’s 
outer  most  operator  is  the  n-ary  constructor  9 ,  then  SubConv  (c)(p)(t)  would: 

1.  Associate  with  each  of  the  n  immediate  subterms  U  of  t  either  the  set 
of  rewrite  triples  returned  by  c(p)(U),  or  if  this  fails,  the  singleton  triple 
returned  by  ldConv(p)(U).  If  all  n  subterm  rewrites  fail,  then  SubConv 
fails. 

2.  Form  all  possible  n-tuples  where  the  tth  element  of  the  tuple  comes  from 
the  triple  set  associated  with  the  term  For  each  such  tuple,  search  the 
lemma  database  for  a  congruence  lemma  which  states  that  for  some  Re,  9 
is  Ao-functional  in  the  tuple’s  n  rewrite  relations.  If  such  a  lemma  exists, 
use  it  to  construct  a  new  rewrite  <Ro,9(t'l,.. tac>  where  toe  applies 
/  and  the  taci. 

3.  Return  the  set  of  new  triples  or  fail  if  the  set  is  empty. 

The  effect  of  the  above  construction  is  that  SubConv  generates,  bottom- 
up,  all  possible  congruence  proofs.  The  resulting  set  of  triples  can  be  used  to 
selectively  add  new  facts  to  the  hypothesis  list  of  the  sequent  and  for  subsequent 
inference. 

While  this  approach  makes  the  fullest  use  of  the  lemma  database,  its  time 
complexity  is  exponential  in  the  depth  of  the  rewritten  subterms.  Our  solution 
to  this  combinatorial  explosion  rests  on  the  observation  that  in  most  cases,  when 
there  is  a  choice  among  rewrite  relations,  it  suffices  to  pick  the  strongest  relation. 
For  example,  it  is  preferable  to  know  that  two  types  are  equal  instead  related 
by  bi-implication  (if  and  only  if).  Similarly,  bi- implication  is  a  stronger  relation 
than  implication,  and  less-than  is  stronger  than  less-than-or-equal.  So.  for  ex¬ 
ample,  one  generally  prefers  to  use  the  congruence  lemma  given  by  Equation  3 
over  Equation  4. 

Our  approach,  which  ia  linear  in  the  depth  of  the  rewritten  subterms,  always 
returns  the  strongest  possible  rewrite  relation.  We  use  a  user  provided  table 
to  determine  relative  relational  strength.  SvbConv(c)(p)(t)  produces  a  single 
rewrite  triple  as  follows: 

1.  The  function  c(p)  is  applied  to  each  subterm  f,.  For  each  subterm  this 
yields  the  triple  <Ri,  t\,  fac,>,  or,  failing  that,  the  triple  fdConvip)(U).  If 
all  n  conversions  fail,  then  SubConv  fails. 

2.  The  operator  9  and  the  relations  Ri  are  used  to  index  into  the  library  for 
congruence  lemmas  that  specify  relations  Rq  such  that  9  is  flo-functionai 
in  the  R\.  When  more  than  one  such  lemma  is  found,  the  one  with  the 
strongest  Rq  is  chosen.  If  no  such  lemma  is  found,  SubConv  fails. 
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3.  A  tactic  tae  is  constructed  that  applies  the  lemma  found  in  the  previous 
step.  This  reduces  the  proof  of  t  Rq  t'  (where  t'  =  6(t\, ...,  t'n))  to  the 
subgoals  U  Ri  f(.  Tae  then  proves  each  such  subgoal  by  tac,. 

4.  The  rewrite  triple  <Rq,  t1,  tae>  is  returned. 

3.3.3  Subterm  Traversal 

SubConv  provides  the  basis  for  traversing  terms  recursively.  It  is  now  easy  to 
write  an  operator  Depth(c)  that  recursively  rewrites  all  subterms  of  a  term  in 
depth-first  order.  Its  recursive  definition  is 

({SubConv  ( Depth(c )))  TEENC  RepeatC(c ))  ORELSEC  (RepeatC(c)). 

Similarly,  top-down  rewriting  is  accomplished  by  Top(c)  whose  recursive  defini¬ 
tion  is 

ProgressC(RcpeatC(c))  ORELSEC  (SubConv  ( Top(c)))* 

3.3.4  An  Example 

Let  c  be  a  conversion  that  rewrites  t  to  t'  under  the  less-than  relation  <.  Sup¬ 
pose  m  is  non-negative,  the  sequent  p  contains  appropriate  well-formedness 
hypotheses,  and  the  database  contains  the  congruence  lemmas  given  by 


a  —  c,  b<d  »  a  <  b  =>  c  <  d 


(5) 


and  Equation  3.  Then 

Top(c)(p)(n  <  m*t) 


returns  the  triple 

<=> ,  n  <  m  *  t' ,  tac>.  (6) 

Tracing  Top's  recursive  execution,  we  find  that  after  two  calls  to  SubConv  c{p)(t) 
returns  the  triple  «,  t' ,  tac i>.  The  second  SubConv  uses  Equation  3  and  the 
previous  triple  (as  well  as  the  triple  where  m  is  rewritten  by  the  identity  conver¬ 
sion)  and  returns  the  triple  «,  m  *  f',  tacj>.  Finally,  the  first  call  to  SubConv 
uses  Equation  5  and  returns  the  final  triple,  Equation  6. 

The  end  result  of  the  above  rewrite  is  that  n<m*t=>n<m*t'  may 
be  added  to  the  hypothesis  list  and  this  new  hypothesis  can  be  used  for  other 
rewrites  or  forward  and  backchain  deductions.  This  example  demonstrates  how 
SubConv  uses  the  lemma  database  to  reason  about  inequalities.  It  also  proves 
that  our  approach  is  strictly  stronger  than  Paulson’s. 

4  ProgrtnC(c)  is  a  combinator  that  fails  when  c  behaves  like  HConv. 
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4  Conclusion 


Our  system  is  a  practical  solution  to  an  important  theorem  proving  problem.  It 
dramatically  raises  the  level  of  user/system  interaction;  moreover,  it  provides  a 
foundation  for  building  high-level  proof  procedures  such  as  decision  procedures 
based  on  equational  reasoning  and  term  normalization.  Examples  are  given  in 
[1,2]. 

One  area  for  future  research  is  rewrite  efficiency.  Our  approach  essentially 
justifies  rewrites  twice:  once  when  using  the  lemma  database  to  construct 
rewrites,  and  again  when  the  tactics  are  executed.  An  alternative  is  to  build 
proofs  directly;  however,  this  approach  is  wasteful  when  conversions  fail  and 
ORELSEC  uses  failure  as  selective  composition.  Another  possibility  is  to  reflect 
Nuprl’s  meta-language  into  the  object  language  and  prove  conversions  correct. 
Such  an  approach  would  obviate  the  need  to  construct  tactics  as  the  rewrite  is 
formally  proved  valid.  Howe[8]  has  had  some  success  building  a  partial-reflection 
library  and  verifying  basic  rewrite  strategies.  Unfortunately  his  library  is  lim¬ 
ited  in  scope  and  has  its  own  efficiency  problems  which  stem  from  inefficiencies 
in  Nuprl’s  evaluator.  There  is  a  research  effort  at  Cornell  to  design  a  reflected 
object  language  that  will  better  support  this  style  of  rewriting. 
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ABSTRACT.  We  use  an  adaptive  mesh  moving  and  refinement  finite  volume  method  to 
solve  the  transient  Euler  equations  of  compressible  flow  in  one  and  two  space  dimensions. 
Numerical  solutions  are  generated  by  a  MacCormack  scheme  with  Davis’s  artificial  viscos¬ 
ity  model.  Richardson’s  extrapolation  is  used  to  calculate  estimates  of  the  local  discretiza¬ 
tion  error  which  can  be  used  to  control  mesh  motion  and  refinement  Questions  regarding 
the  optimal  combination  of  adaptive  strategies  and  the  characterization  of  the  initial  mesh 
are  investigated.  Results  indicate  that  local  mesh  refinement  with  and  without  mesh  mov¬ 
ing  provide  dramatic  improvements  in  accuracy  over  uniform  mesh  solutions;  that  mesh 
motion  provides  good  results  on  relatively  fine  initial  meshes;  that  each  problem  has  an 
optimal  initial  mesh  and  that  it  is  more  efficient  to  begin  with  a  coarser  than  optimal  mesh 
and  refine  rather  than  starting  with  too  fine  a  mesh;  and  that  a  combination  of  both  the 
adaptive  strategies  produced  the  most  accurate  solutions. 


1  This  research  was  partially  supported  by  the  SDIO/IST  under  management  of  the  U.  S.  Army  Research 
Office  under  Contract  Number  DAAL  03-86-K-0112. 
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L  INTRODUCTION.  Our  goal  is  to  develop  reliable,  robust,  and  efficient  software  far 
solving  hyperbolic  partial  differential  equations.  With  this  in  mind,  Amey  and  Flaherty 
[4]  developed  an  adaptive  procedure  combining  mesh  motion  and  mesh  refinement  for 
solving  one-  and  two-dimensional  vector  systems  of  time-dependent  partial  differential 
equations.  The  solution,  mesh  motion,  and  local  refinement  procedures  were  explicit  and 
independent  of  each  other;  thus,  modules  can  easily  be  replaced. 

Amey  and  Flaherty’s  [4]  method  solves  vector  systems  of  hyperbolic  conservation 
laws  having  the  form 

u,  +  frU.y.  f.u)  +  g,(rj,f,a)  =  0,  (la) 

with  initial  conditions 

u(x,  y,  0)  =  Uo(x,  y),  (lb) 

and  with  appropriate  well-posed  boundary  conditions  on  a  one-  or  two-dimensional 
domain  CL  Their  adaptive  approach  consists  of  moving  a  coarse  "base"  mesh  of  quadrila¬ 
teral  cells  to  follow  fronts  and  reduce  dispersive  errors.  Recursive  refinement  of  mesh 
cells  is  performed  when  necessary  to  satisfy  a  prescribed  local  error  tolerance.  Solutions 
are  generated  using  MacCormack’s  [10]  finite  volume  scheme  coupled  with  Davis’s  [8] 
artificial  viscosity  model  to  malcg  the  scheme  total  variation  diminishing  (TVD).  Local 
motion  and  refinement  indicators  on  each  cell  of  the  mesh  are  used  to  control  mesh 
motion  and  refinement,  respectively.  They  used  an  estimate  of  the  local  discretization 
error  obtained  by  Richardson’s  extrapolation  [2,11]  as  the  mesh  refinement  indicator.  For 
the  examples  presented  in  this  paper,  we  used  a  normalized  solution  gradient  as  the  mesh 
movement  indicator,  although  other  choices  are  possible  as  long  as  the  indicator  is  large 
where  additional  resolution  is  required  and  small  where  less  resolution  is  desired.  An 
automatic  time  step  adjustment  feature,  based  on  maximizing  the  Courant  stability  condi¬ 
tion,  is  also  provided  in  our  algorithm. 

The  generation  of  a  proper  initial  mesh  is  important  for  the  efficiency  of  any  adaptive 
algorithm.  Initially  we  create  a  uniform  mesh  on  Q  having  a  specified  number  of  nodes 
without  considering  the  possibility  of  any  high-error  regions.  A  global  mesh  refinement  is 
performed  on  the  first  time  step  to  estimate  the  discretization  error  of  the  initial  data.  The 
nodes  of  the  mesh  are  then  placed  to  equidistribute  this  error  estimate.  As  time  evolves, 
these  nodes  are  dynamically  moved  to  reduce  dispersive  errors. 

Amey  and  Flaherty  [4,5]  perform  mesh  motion  based  on  an  intuitive  approach  by 
identifying  computational  cells  having  large  motion  indicators  and  clustering  them  into 
isolated  regions  that  are  presumed  to  contain  similar  solution  characteristics.  The  center 
of  motion  indicators  of  each  clustered  region  is  moved  so  as  to  follow  the  dynamics  of  the 
solution.  Remaining  portions  of  the  mesh  are  moved  according  to  an  algebraic  function  so 
as  to  produce  a  smooth  grid  having  minimal  distortion.  Most  mesh  points  cannot  move 
independently  but  must  be  coupled  to  their  immediate  neighbors.  The  amount  of  move¬ 
ment  is  determined  by  a  function  which  ensures  that  the  center  rm(t)  of  error  clusters 
moves  according  to  the  differential  equation 

rn  =0,  (2) 
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used  by  Coyle  et  aL  [7].  Clustered  regions  created  at  one  time  step  can  subsequently  be 
destroyed  when  a  dynamic  phenomena  subsides.  Similarly,  two  or  more  clusters  can  be 
united  when  structures  of  the  solution  intersect 

Results  obtained  by  using  Amey  and  Flaherty’s  [1,3,4,51  adaptive  algorithm  in  one 
and  two  dimensions  indicated  that  in  some  instances,  proper  mesh  motion  was  capable  of 
dramatically  reducing  errors  for  a  modest  increase  in  the  cost  of  computation.  In  general, 
however,  mesh  motion  alone  cannot  produce  solutions  that  satisfy  arbitrarily  prescribed 
accuracy  requirements.  They,  therefore,  combined  mesh  motion  with  a  local  temporal  and 
spatial  cellular  mesh  refinement  strategy  [4,6].  The  space-time  cells  of  a  mesh  that 
violated  the  prescribed  error  tolerance  were  gathered  into  clusters  and  were  recursively 
bisected  in  space  and  time.  The  problem  was  solved  locally  on  the  successively  smaller 
domains  created  by  the  clustering  and  refinement  Initial  and  boundary  data  for  any 
refined  mesh  were  determined  by  interpolation  from  their  "parent"  coarser  mesh.  Error 
tolerances  involved  control  of  the  local  error  per  unit  time  step  and  were,  thus,  halved  at 
each  refinement  to  account  for  the  binary  temporal  refinement 

A  dynamic  tree  structure,  where  fine  grids  are  regarded  as  offspring  of  coarser  ones, 
is  used  to  manage  the  data  associated  with  the  motion  and  refinement  strategies.  Solutions 
were  generated  by  a  preorder  traversal  of  the  tree;  thus,  solutions  on  all  fine  meshes  pre¬ 
ceded  those  on  coarser  ones. 

Our  results  on  solving  shock  problems  for  the  one-  and  two-dimensional  Euler  equa¬ 
tions  are  presented  in  Section  2.  We  explore  the  relationship  of  the  base  mesh  to  the  level 
of  refinement  We  found,  for  example,  that  it  is  more  effective  to  begin  with  a  coarse 
mesh  and  perform  more  refinement  than  to  create  a  finer  mesh  which  needs  less 
refinement  Effective  mesh  motion,  on  the  other  hand,  required  a  finer  base  mesh  rather 
than  a  coarser  one.  The  combination  of  mesh  motion  and  refinement  produced  the  best 
results.  Local  refinement  with  and  without  mesh  moving  provide  substantial  improve¬ 
ments  in  accuracy  per  unit  cost  relative  to  computations  on  uniform  stationary  mesh  solu¬ 
tions. 


2.  NUMERICAL  EXPERIMENTS.  Computer  codes  for  one-  and  two-dimensional  prob¬ 
lems  based  on  Amey  and  Flaherty’s  [4]  algorithm  have  been  implemented  in  FORTRAN 
on  an  IBM  3090-200S  computer  and  tested  on  several  problems  [1,4].  In  this  paper,  we 
consider  examples  involving  solutions  of  the  Euler  equations  for  a  one-dimensional  shock 
tube  and  a  two-dimensional  piston  problem.  The  Euler  equations  for  a  perfect  compressi¬ 
ble  fluid  are  studied  in  the  r  conservative  form 


where 


U,  +  fx(U)  +  gy(U)  =  0, 


(3a) 


p 

pu 

pv 

pu 

,  f(u)  = 

pu2+p 

,  g(u)  = 

pvu 

Pv 

puv 

pv2+p 

e 

-  j 

u{e  +p). 

v(e  +p) 

(3b,c,d) 
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Here,  p  is  the  fluid  density;  u  and  v  are  the  Caxtesian  components  of  the  velocity  vector, 
e  is  the  total  internal  energy  per  unit  volume;  and  the  subscripts  x ,  y ,  and  t  denote  partial 
differentiation  with  respect  to  the  spatial  coordinates  and  time,  respectively.  The  pressure 
p  is  evaluated  according  to  the  ideal  gas  equation  of  state  as 

p  -<Y-  DI «  -p(«2  +  v2)/2],  (4) 


where  y  is  the  specific  heat  ratio  of  the  fluid.  Computational  experiments  were  conducted 
with  y  =  1*4.  Solution  accuracy  is  appraised  in  the  L 1  norm 

H«(\  \  f)Hi  -  max  f  I edx,  y,  t)\  dx  dy ,  (5) 

where  ej(x,y,  t)  is  a  piecewise  constant  approximation  of  u}(x,  y,  t)  -  Uj  obtained  by 
using  values  at  cell  centers. 


Example  1.  Consider  Sod’s  [12]  one-dimensional  shock  tube  problem  which  consists 
of  solving  (3,4)  with  v  =  0  and  d(  )/dy  =  0  subject  to  the  initial  conditions 


P(x,  0) 
p(x,  0) 
u(x,  0). 


[1.0,  1.0,  0.01r, 

[0.125,  0.1,  0.0]r, 


if  -0.2  £  x  £0.5 
if  0.5  <  x  £  1.5  ' 


(6) 


A  diaphragm  at  x  -  0.5  separates  two  regions  of  a  tube  that  contain  gases  at  different 
densities  and  pressures.  The  two  regions  are  in  a  constant  state  and  both  fluids  are  ini¬ 
tially  at  rest  At  time  t  *  0  the  diaphragm  is  ruptured  and  three  waves  are  generated;  a 
shock  moving  with  velocity  1.7522,  a  contact  discontinuity  moving  with  velocity  0.9275, 
and  an  expansion  wave  centered  between  0.5  -  1.1832r  £  x  £  0.5  -  0.0703 1.  The  exact 

solution  [13]  of  this  problem  is 
* 

[0.0,  1.0,  1.0]r,  if  ti  £  -1.1832 

[0.9860+ti/  1.2,  (l-u/5.9161)5,  p1Af,  if  - 1.1832  Sq  £-0.0703 
^  [0.9275,  0.4263,  0.303  if,  if  -0.0703  £  q  <  0.9275  ,  (7) 

[0.9275,  0,2656,  0.303lf,  if  0.9275  <r\<  1.7522 

[0.0,  0.125,  0.1]r,  if  1.7522  <  T) 


u(x,  r) 
P(*,  t) 
J>(x’  O. 


where  rj  —  (x  -0.5) /r. 

The  "base''  mesh  is  the  coarsest  mesh  used  to  solve  a  problem.  It  reflects  the  scale 
on  which  dominant  temporal  and  spatial  changes  in  the  solution  occur.  Selecting  too 
coarse  a  base  mesh  will  result  in  excessive  refinement  Selecting  too  fine  a  base  mesh 
will  be  inefficient  At  present  selection  of  the  base  mesh  is  at  the  discretion  of  the  user 
and  in  this  first  experiment  we  hope  to  provide  guidance  for  this  choice  as  well  as  for 
future  automated  base  mesh  selection  procedures.  Six  cases  having  base  meshes  of 
N  *  2*,  k  =  3,4,... ,8,  cells  were  solved  on  0  <  t  £0.35.  The  maximum  number  of 
refinement  levels,  the  initial  time  step,  and  the  local  discretization  error  were  set  at  8— k, 
3  x  2^"*  x  1CT4,  and  2s"*  x  10“5,  k  -  3,4,...,8,  respectively,  so  that  the  finest  allowable 
discretization  and  local  error  tolerance  were  constant  for  all  six  cases. 
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JV 

Error 

Tolerance 

(xlO5) 

Max.  No. 
Refinement 
Levels 

Normalized 
CPU  Time 
(Effort) 

No.  Space- 
Time 
Cells 

Effort  per 
Unit  Accuracy 
(xlO3) 

8 

4.0 

5 

1.295 

28162 

3.71 

16 

4 

1.066 

23026 

2.17 

32 

3 

1.000 

21006 

224 

64 

2 

1.104 

21396 

2.67 

128 

0.25 

1 

1.533 

25996 

3.89 

256 

0.125 

o 

4.104 

63744 

6.70 

Table  1.  Normalized  CPU  time,  number  of  space-time  cells,  and  effort  per  unit 
accuracy  at  t  =*  0.35  with  different  initial  base  meshes  for  Example  1.  The 
parameters  axe  adjusted  so  that  the  finest  discretization  and  the  corresponding  lo¬ 
cal  error  tolerance  are  constant  for  all  cases. 


Results  for  the  normalized  CPU  time,  the  number  of  space-time  cells,  and  the  effort 
per  unit  accuracy  are  reported  in  Table  1  for  each  of  the  six  cases.  Effort  per  unit  accu¬ 
racy  is  the  product  of  the  normalized  CPU  time  and  the  L\  error  at  terminal  tune  (0.35  in 
this  case).  In  Figure  1,  we  show  how  the  effort  per  unit  accuracy  varies  with  the  loga¬ 
rithm  of  the  number  of  cells  in  the  base  mesh.  It  is  preferable  to  select  a  coarser  base 
mesh  than  a  finer  one  since,  with  our  procedures,  refinement  of  a  coarse  mesh  will 
decrease  the  effort/accuracy  ratio.  The  number  of  space-time  cells  vary  in  approximately 
the  same  ratio  as  die  CPU  time  suggesting  that  the  overhead  associated  with  data  manage¬ 
ment  is  minimal. 


Error  Toler¬ 
ance  (x  105) 

Normalized 
CPU  Time 

No.  of  Space- 
Time  Cells 

ile  Hi  (xlO3) 

128.0 

1.000 

910 

25.7 

32.0 

4.473 

7532 

12.7 

8.0 

9.370 

19322 

6.20 

2.0 

15.610 

34562 

3.03 

Table  2.  Normalized  CPU  time,  number  of  space-time  cells,  and  global  L  {  error 
at  r  =  0.35  as  a  function  of  the  local  error  tolerance  for  Example  1  using  local 
mesh  refinement 


We  continued  our  experiments  by  solving  this  problem  on  -0.2  £x  S  1.5  for 
0  <  /  S  0.35  using  local  mesh  refinement  on  16-element  base  meshes,  an  initial  time  step 
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log2  (No.  of  elements  in  base  mesh) 


Figure  1.  Effort  per  unit  accuracy  vs.  number  of  elements  in  the  base  mesh  for 
Example  1. 


of  0.0035,  and  with  varying  error  tolerances.  Refinement  was  restricted  to  a  maximum  of 
four  levels  to  avoid  excessive  refinement  near  shocks.  The  normalized  CPU  time,  the 
number  of  space-time  ceils  used  to  solve  the  problem,  and  the  errors  in  L  i  at  t  =  0.35  are 
presented  in  Table  2  as  functions  of  the  local  discretization  error  tolerance.  For  small 
tolerances,  the  CPU  times  and  the  number  of  space-time  cells  increase  at  approximately 


446 


the  same  rate  as  the  Li  error  decreases,  again  indicating  a  minimal  overhead  associated 
with  refinement  The  decrease  in  the  local  pointwise  error  tolerance  is  quadratic  when 
compared  with  the  actual  global  Lx  error,  which  is  what  one  would  expect  for  problems 
having  smooth  solutions.  The  result  apparently  carries  over  to  this  shock  problem. 


Strategy 

Normalized 
CPU  Time 

No.  of  Space- 
Time  Cells 

He  Hi  (x  103) 

Uniform  Mesh 

1.000 

— 

Coarse  Mesh  Motion 

2.026 

1 

Refinement 

19.009 

■ 

Motion  &  Refinement 

26.532 

44602 

1.88 

Fine  Mesh  Motion 

8.584 

12690 

4.37 

Table  3.  Normalized  CPU  time,  number  of  space-time  cells,  and  global  Lx  error 
at  t  -  0.35  for  adaptive  and  standard  solutions  of  Example  1. 


The  third  experiment  involves  comparing  adaptive  solutions  obtained  using  mesh 
motion,  local  mesh  refinement  and  mesh  motion  plus  local  refinement  with  one  obtained 
on  a  uniform  mesh.  In  each  case,  a  16-element  base  mesh  and  an  initial  time  step  of 
0.0035  was  selected.  An  error  tolerance  of  0.00002  was  used  for  those  solutions  that 
involved  refinement  A  fifth  solution  involving  motion  of  a  finer  50-element  mesh  was 
also  generated.  Data  similar  to  that  presented  in  Table  2  is  displayed  in  Table  3  compar¬ 
ing  the  results  of  different  adaptive  strategies  with  those  on  a  stationary  uniform  mesh.  In 
Figures  2  to  6  we  display  the  calculated  density  as  a  function  of  x  at  t  »  0,  0.09,  0.18, 
0.27,  and  0.35,  the  meshes  used,  and  the  time  steps  selected  for  each  of  the  solutions 
shown  in  Table  3.  The  uniform  mesh  solution  shown  in  Figure  2  exhibits  excessive 
diffusion  at  the  shock,  at  the  contact  surface,  and  in  the  expansion  region.  However,  the 
time  step  increases  rapidly  in  accordance  with  the  Courant  condition.  A  larger  initial  time 
step  could  clearly  have  been  used;  however,  we  wanted  to  use  the  same  initial  time  step 
for  all  the  cases.  In  Figure  3  we  show  that  the  moving  mesh  procedure  follows  the  dom¬ 
inant  features  of  the  solution.  Results  are  clearly  superior  to  those  in  Figure  2,  but  the 
esh  is  too  coarse  to  obtain  good  resolution  everywhere.  The  results  in  Figure  6  demon¬ 
strate  that  far  better  resolution  is  obtained  when  a  finer  mesh  consisting  of  50  elements  is 
used;  however,  this  mesh  did  not  move  correcdy  in  the  expansion  region  because  the  mesh 
movement  indicator  is  too  small  there.  The  initial  mesh  generator  distributes  a  specified 
number  of  nodes  N  based  on  the  initial  data.  In  this  case,  the  initial  data  has  a  jump 
discontinuity  at  x  =  0.5,  so  nodes  were  clustered  around  that  point  and  then  gradually 
spread  across  the  domain.  There  are  too  many  nodes  in  the  expansion  region  in  relation 
to  the  small  magnitude  of  the  movement  indicator  to  produce  adequate  motion  there.  A 
static  rezone  of  the  mesh  could  alleviate  this  problem.  The  time  steps  of  both  solutions 
with  mesh  moving  (Figures  3  and  6)  are  erratic  for  small  times  while  the  mesh  is 
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adjusting  itself  to  the  three  breaking  waves.  Time  steps  increase  at  the  same  rate  as  those 
for  the  uniform  mesh  solution  of  Figure  2  when  a  coarser  mesh  is  used.  Incorrect  motion 
of  the  fine  mesh  in  the  expansion  region  (Figure  6)  prevented  a  similar  increase  of  the 
rime  step.  The  results  depicted  in  Figure  4  show  that  refinement  was  correctly  performed 
at  all  critical  points  of  the  calculation.  In  each  case,  shocks  are  captured  sharply  with  the 
comet  speed  As  expected  with  Davis’s  [8]  artificial  viscosity  model,  diffusive  effects  are 
more  pronounced  near  the  contact  surface  than  at  the  shock.  Results  obtained  using  both 
mesh  motion  and  refinement  are  depicted  in  Figure  5.  The  results  have  improved  some¬ 
what  but  at  the  cost  of  a  significantly  higher  computational  effort  relative  to  the  solution 
of  Figure  4.  This  suggests  that  mesh  motion,  with  or  without  refinement,  is  not  competi¬ 
tive  with  refinement  alone.  Additional  experimentation  is  needed  to  determine  a  better 
combination  of  mesh  moving  and  refinement 

Example  2.  Consider  the  solution  of  the  Euler  equations  (3,4)  in  a  region  exterior  to 
an  infinite  cylindrical  piston  that  is  expanding  radially  creating  a  radially  expanding  shock 
wave.  We  ignore  the  cylindrical  symmetry  and  solve  this  problem  in  one  quadrant  of  the 
two-dimensional  rectangular  domain  -0.05  y  £  0.05  with  the  two-dimensional  algo¬ 
rithm  of  Amey  and  Flaherty  [4].  Self-similar  solutions  of  this  test  problem  are  obtained 
by  solving  a  pair  of  ordinary  differential  equations  (by  numerical  integration)  for  the  radial 
velocity  and  acoustic  speed  [9]. 

We  solved  this  problem  for  0  <  t  £  0.0096  with  the  piston  initially  positioned  at  a 
radius  of  0.016023  and  having  a  velocity  of  1.6185.  Numerical  solutions  were  calculated 
on  a  26  x  26  spatial  mesh  (i)  without  adaptation,  (ii)  with  one  level  of  local  refinement, 
and  (iii)  with  mesh  motion  and  one  level  of  refinement  Contours  of  the  density  at 
t  *  0.0096  are  presented  for  the  exact  and  three  numerical  solutions  in  Figure  7.  The  spa¬ 
tial  meshes  produced  by  the  two  adaptive  strategies  at  t  »  0.0096  are  shown  in  Figure  8. 

Clearly  one  level  of  refinement  is  not  sufficient  to  adequately  resolve  the  structure  of 
this  solution.  We  were  forced  to  limit  our  computations  to  this  level  because  of  memory 
restrictions  on  our  computing  system.  Nevertheless,  local  refinement  with  and  without 
mesh  moving  provide  improvements  in  accuracy  over  uniform  stationary  mesh  solutions. 
Detailed  quantitative  comparisons  have  yet  to  be  performed;  however,  qualitatively,  the 
expanding  shock  is  sharper  in  both  adaptive  solutions.  The  combination  of  mesh  motion 
and  refinement  provides  additional  improvement 


3.  CONCLUSIONS.  We  have  applied  an  adaptive  mesh  motion  and  refinement  method 
for  time-dependent  partial  differential  equations  to  the  one-  and  two-dimensional  Euler 
equations.  Our  method  can  be  used  with  several  numerical  methods  and  local  error  indi¬ 
cators  to  produce  solutions  that  satisfy  prescribed  local  tolerances.  Mesh  motion  is  global 
and  is  performed  at  every  time  step.  Mesh  refinement  is  cellular  and  can  be  used  on 
irregular  or  moving  meshes  of  quadrilateral  cells. 

Our  results  indicate  that  mesh  refinement  can  be  used  to  achieve  prescribed  levels  of 
accuracy.  Refinement  is  easy,  recursive,  and  works  well.  It  appears  to  be  computationally 
efficient  for  a  given  accuracy  level.  Proper  mesh  movement  improved  the  computed 
results.  Refinement  has  a  definite  advantage  over  mesh  motion  in  that  it  is  inferred  in  an 
a  posteriori  manner  from  a  preliminary  solution  whereas  our  mesh  modon  is  applied  in  an 
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At  (x  101) 


Figure  2.  Solutions,  mesh  trajectories,  and  time  step  profile  for  computations 
performed  with  a  stationary  uniform  mesh  of  16  cells  for  Example  1. 
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At  (x  101) 


Figure  4.  Solutions,  mesh  trajectories,  and  time  step  profile  for  computations 
performed  with  adaptive  local  mesh  refinement  for  Example  1. 
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Figure  5.  Solutions,  mesh  trajectories,  and  time  step  profile  for  computations 
performed  with  both  adaptive  mesh  motion  and  local  mesh  refinement  for  Exam¬ 
ple  1. 
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*  (X 101)  i  (X 101) 


Figure  7.  Density  contours  for  Example  2  at  t  =*  0.0096  obtained  from  the  exact 
solution  (upper  left)  and  by  computed  solutions  on  a  uniform  stationary  mesh 
(upper  right),  a  uniform  stationary  base  mesh  with  one  level  of  refinement 
(lower  left),  and  a  moving  base  mesh  with  one  level  of  refinement  (lower  right). 

a  priori  fashion  by  extrapolating  the  mesh  behavior  of  the  previous  two  base  time  steps. 
As  a  result,  mesh  refinement  may  be  inefficient  but  it  never  leads  to  anomoious  behavior. 
On  the  other  hand,  incorrect  mesh  motion  can  easily  mess  a  local  nonuniformity  in  the 
solution  that  evolves  suddenly.  Such  incorrect  motion  restricts  the  size  of  the  time  steps 
and  diminishes  the  overall  efficiency  of  the  adaptive  method.  These  difficulties  can 
largely  be  overcome  by  combining  mesh  motion  with  mesh  refinement  and  static  mesh 
redistribution.  Further  experimentation  and  analysis  are  needed  in  order  to  determine 
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Figure  8.  Spatial  meshes  at  r  =  0.0096  for  Example  2  using  one  level  of  local 
mesh  refinement  on  a  uniform  stationary  base  mesh  (left)  and  a  moving  base 
mesh  (right). 


optimal  combinations  of  these  strategies. 

We  used  the  first  example  to  demonstrate  that  each  problem  has  an  optr  nl  initial 
base  mesh  size  and  that  it  is  always  computationally  efficient  to  adaptively  refine  begin¬ 
ning  with  a  less  than  optimal  mesh  rather  than  starting  with  too  fine  a  mesh.  This  exam¬ 
ple  also  showed  that  for  mesh  motion  to  be  effective,  a  fine  base  mesh  is  absolutely  neces¬ 
sary.  A  combination  of  both  the  adaptive  strategies  of  mesh  motion  and  refinement  pro¬ 
duced  the  best  results  but  at  the  cost  of  a  significantly  higher  computational  effort.  The 
second  example  demonstrates  that  our  adaptive  mesh  procedures  extend  to  two- 
dimensional  problems. 

We  are  currently  developing  higher-order  explicit  finite  volume  methods  to  replace 
the  second-order  MacCormack  scheme.  The  present  Richardson’s  extrapolation- based  error 
indicator  is  expensive  and  we  are  seeking  ways  of  replacing  it  by  using  p-refinement  tech¬ 
niques.  Such  methods  have  been  shown  to  have  an  excellent  cost  performance  ratio  when 
used  in  conjunction  with  finite  element  methods.  We  are  also  working  on  a  modification 
of  our  algorithm  which  allows  a  variety  of  geometries.  Our  adaptive  techniques  must  be 
able  to  take  advantage  of  the  latest  advances  in  vector  and  parallel  computing  hardware. 
The  tree  is  a  highly  parallel  structure  and  we  are  developing  solution  procedures  that 
exploit  this  in  a  variety  of  shared  and  distributed  memory  parallel  computing  environ¬ 
ments;  however,  it  is  difficult  to  parallelize  mesh  motion  because  of  its  global  nature. 
Ceils  assigned  to  a  particular  processor  may  migrate  to  the  domain  of  other  neighboring 
processors  and  cause  non-trivial  bookkeeping  problems.  Mesh  motion  is  also  difficult  to 
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perform  in  higher  dimensions.  We  are,  therefore,  actively  considering  hp-type  techniques 

in  parallel  environments. 
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Abstract 

We  present  the  results  of  a  study  of  line  iterative  methods  for  solving  linear  systems 
arising  from  finite  difference  discretizations  of  non-self- adjoint  elliptic  partial  differential 
equations  on  two-dimensional  domains.  The  methods  consist  of  performing  one  step  of 
cyclic  reduction,  followed  by  solution  of  the  resulting  reduced  system  by  line  relaxation. 
We  consider  both  one-line  and  two-line  relaxation  methods,  and  we  present  analytic  and 
experimented  results  showing  that  these  classes  of  methods  are  highly  effective  for  solving 
the  convection-diffusion  equation.  The  paper  summarizes  results  from  [2]  and  [3],  where 
further  details  can  be  found. 


1.  Introduction. 


We  consider  iterative  methods  for  solving  linear  systems  of  the  type  that  arise  from 
two-cyclic  discretizations  of  two-dimensional  elliptic  partial  differential  equations.  Such 
systems  can  be  ordered  using  '  '  ’ ack  ordering  so  that  they  have  the  form 


(1.1) 


D  C\  /u(r) 
E  F )  Vuw 


where  D  and  F  are  diagonal  matrices.  If  blocK  elimination  is  used  to  decouple  the  ‘‘red” 
points  from  the  “black”  points  u^b\  the  result  is  a  reduced  system 

(1.2)  [F  -  ED~lC]u{b)  =  vw  -  ED-1  u(r) . 

Let 


(1.3) 


S  =  F  -  ED~1C,  s  =  v(6)  -  ED~1v(-r). 


In  this  paper,  we  describe  a  study  of  relaxation  methods  for  solving  (1.2)  when  (1.1) 
comes  from  a  finite-difference  discretization  of  the  constant  coefficient  convection-diffusion 
equation 


(1.4) 


Au  =  -  Au  +  <rut  4-  rxiy  =  f 


‘Supported  by  the  U.  S.  Army  Research  Office. 
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with  Dirichlet  boundary  conditions.  We  consider  several  orderings  of  the  rows  and  columns 
of  5,  based  on  either  a  one-line  ordering  or  a  two-line  ordering  of  the  reduced  grid.  Line 
methods  of  these  types  have  been  considered  for  solving  the  original  problem  (1.1)  in  [l], 
[8],  and  for  the  reduced  system  in  [4],  [7],  [8].  For  the  ordering  strategies  considered,  the 
reduced  matrices  have  block  Property  A  so  that  Young’s  analysis  of  iterative  methods 
[11]  is  applicable.  We  use  this  analysis  to  determine  the  convergence  properties  of  block 
Jacobi,  Gauss-Seidel  and  successive  overrelaxation  (SOR)  methods  for  solving  the  discrete 
convection-diffusion  equation,  in  terms  of  discrete  cell  Reynolds  numbers  ah/2  and  rh/2. 
In  addition,  we  present  the  results  of  numerical  experiments  showing  convergence  behavior 
not  revealed  by  the  analysis.  Together,  the  analytic  and  numerical  results  show  that  the 
two  types  of  orderings  lead  to  very  effective  methods  for  solving  (1.4). 

An  outline  of  the  paper  is  as  follows.  In  §2,  we  describe  two  discretization  schemes 
for  (1.4),  and  we  examine  the  truncation  error  associated  with  taking  the  reduced  system 
as  an  approximation  of  (1.4).  In  §3,  we  present  the  one-line  and  two-line  orderings  for  the 
unknowns  of  (1.2),  including  variants  based  on  block  red-black  groupings  of  unknowns, 
and  we  outline  the  convergence  analysis  for  line  relaxation  methods.  In  §4,  we  describe 
some  numerical  experiments  that  confirm  and  supplement  the  convergence  results. 

2.  The  convection-diffusion  equation  and  the  reduced  system. 

Consider  the  two-dimensional  convection-diffusion  equation  (1.4),  posed  on  the  unit 
square  G  (0, 1)  x  (0, 1)  with  Dirichlet  boundary  conditions  u  —  g  on  <9f2.  Discretization 
by  a  five-point  finite  difference  operator  leads  to  a  linear  system 


Au  =  v 


where  u  now  denotes  a  vector  in  a  finite  dimensional  space.  We  discretize  on  a  uniform 
n  x  n  grid  using  standard  second  order  differences  for  the  Laplacian  [10],  [11],  and  either 
centered  or  upwind  differences  for  the  first  derivatives.  With  u  ordered  lexicographically 
in  the  natural  ordering  as  (  ui.i.  112,1 . Un,n)T -  the  coefficient  matrix  has  the  form 

(2.1)  A  =  tri[  Ajj— 1,  Ajj ,  Ajj+ 1  ]. 

Here,  tri  [Xjj- 1,  Xjj,  Xjj+i  ]  is  the  (block)  tridiagonal  matrix  whose  j’th  row  contains 
Xj,j-i  Xjj  and  Xjj+i  on  its  subdiagonal,  diagonal  and  superdiagonal,  respectively.  We 
omit  the  subscripts  when  there  is  no  ambiguity.  The  entries  of  (2.1)  axe 

Aj,j- 1  =  hi ,  Ajj  =  tri[c ,  a,  d],  A}j+ 1  =  el, 

where  I  is  the  identity  matrix,  a,  b.  c,  d  and  e  depend  on  the  discretization,  and  all  blocks 
are  of  order  n.  Let  h  =  l/{n  -\-\).  After  scaling  by  h2 ,  the  matrix  entries  are  given  by 

a  =  4.  6  =  —  ( 1  4-  6).  c  —  —  (l-fy). 

d  =  -( 1  -  7),  e  =  -(1  -  6), 
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for  the  centered  difference  scheme,  where  7  =  <rh/2  and  6  =  r/i/2;  and 

a  =  4  +  2(7  +  6),  6  =  -(1  +  2*),  c  =  -(  I  +  27), 

d=- 1,  e  =  — 1, 

for  the  upwind  scheme.  At  the  (i,j)  grid  point,  the  right  hand  side  satisfies  vt}  =  h2fij 
where  fij  =  f(ih,jh). 

In  [2],  we  showed  that  the  reduced  matrix  5  is  a  skewed  nine- point  operator.  At  all 
grid  points  except  those  bordering  5f2,  the  computational  molecule  has  the  form  (after 
scaling  by  a)  given  in  Fig.  2.1. 


—2 de 
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—2bd 
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Fig.  2.1:  Computational  molecules  for  the  reduced  system.  Top:  general  case.  Bottom 
left:  centered  differences.  Bottom  right:  upwind  differences. 


Suppose  centered  differences  are  used  to  discretize  the  first  derivative  terms.  At 
the  ( i,j )  grid  point,  the  discrete  operator  satisfies  p-[Au],;  =  [Au\i}  +  0(h2),  i.e.  the 
truncation  error  of  the  discretization  is  of  order  h2.  The  following  result  shows  that  the 
reduced  system  (1.2)  can  also  be  viewed  as  a  discretization  of  (1.4)  with  truncation  error 
of  order  h2 .  The  proof  is  based  on  Taylor  series,  see  [3].  A  similar  analysis  shows  that  the 
reduced  system  for  the  upwind  scheme  approximates  (1.4)  with  truncation  error  0(h). 

THEOREM  1.  For  the  centered  difference  discretization  of  (l A),  let  S  and  I  denote  the 
reduced  matrix  and  right  hand  side  obtained  by  multiplying  the  reduced  system  by  a  (  =  4). 


Then  for  2  <  i,  j  <  n  —  1,  S  satisfies 

g )j[Su}ij  =  -[(!+  ^“)«**+( 1  +  ^"K»]  +  au*  +  Tuv  +  °(/‘2)’ 

and  J  satisfies 

=  /«  +  o(^2)- 


3.  Convergence  of  line  relaxation  relaxation  methods. 

The  performance  of  iterative  methods  for  solving  (1.2)  depends  on  the  ordering  ot  the 
underlying  grid.  In  this  section,  we  define  the  one-line  orderings  and  two-  line  orderings, 
and  outline  the  convergence  analysis  of  the  resulting  iterative  methods. 

For  the  one-line  orderings,  grid  points  are  grouped  by  diagonal  lines  oriented  at  a 
45°  angle  with  the  horizontal  and  vertical  axes.  For  the  purpose  of  discussion,  we  fix 
the  orientation  to  be  along  the  NW — SE  direction.  In  the  natural  one-line  ordering,  the 
n  —  1  diagonal  lines  are  numbered  starting  from  one  comer  (e.g.  the  SW)  from  1  to  n  —  1, 
and  individual  points  are  numbered  from  bottom  to  top  along  the  lines.  An  example  for 
n  =  7  is  shown  in  the  left  side  of  Fig.  3.1,  where  the  line  indices  are  shown  outside 
In  the  red-black  variant,  the  lines  with  odd  indices  from  the  natural  ordering  are  ordered 
first,  followed  by  those  with  even  indices.  The  individual  grid  points  are  renumbered  to 
be  consistent  with  this  reordering.  An  example  for  n  =  7  is  shown  in  the  right  side  of  Fig. 
3.1. 


12  3  14 


Fig.  3.1:  The  reduced  grid  derived  from  a  7  x  7  grid,  with  natural  one-line  (left)  and 
red-black  one-line  (right)  orderings. 

In  the  two-line  orderings,  points  in  the  reduced  grid  are  grouped  by  pairs  of  horizontal 
or  vertical  lines.  Examples  with  horizontal  lines,  for  n  =  6.  are  shown  in  Fig.  3.2.  The  left 
side  of  the  figure  shows  a  natural  two- line  ordering,  and  the  right  side  shows  a  red- black 
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Fig.  3.2:  The  reduced  grid  derived  from  a  6  x  6  grid,  with  natural  two-line  (left)  and 
red-black  two-line  (right)  orderings. 

two-line  ordering.  We  restrict  our  attention  to  two-line  orderings  with  horizontal  lines; 
generalization  to  vertical  lines  is  straightforward. 

For  all  these  orderings,  we  use  a  splitting  of  the  reduced  matrix, 

5  =  D  -  C, 

where  I?  is  a  block  diagonal  matrix  whose  individual  blocks  come  from  the  underlying 
lines.  Thus,  D  contains  tridiagonal  blocks  for  the  one-line  orderings  and  pentadiagonal 
blocks  for  the  two-line  orderings.  Consider  the  block  Jacobi  iterative  method 

«*+ 1  =  Btikb)  + 

where  B  —  D~XC  is  the  block  Jacobi  iteration  matrix.  The  standard  measure  of  the 
effectiveness  of  this  method  is  the  spectral  radius  p(B);  the  iteration  is  convergent  provided 
p{B )  <  1,  and  convergence  tends  to  be  more  n»pid  if  p(B)  is  closer  to  0  [10].  The  following 
results  determine  bounds  on  p(B)  for  both  the  centered  difference  and  upwind  difference 
schemes,  and  each  of  the  four  orderings  defined  above,  see  [2],  [3]  for  proofs. 

THEOREM  2.  For  the  centered  difference  scheme,  if  |7|  <  1  and  |6|  <  1,  then  the 
spectral  radii  of  the  one-line  block  Jacobi  iteration  matrices  for  the  reduced  system  are 
hounded  by 

_ (v +  VT=W)2 _ 

8  —  ( -y/ 1  —  72  4-  \/l  —  62  )2  +  2\/(l  —  72)(1  —  62)  (1  —  cos(7 rh)) 

For  the  upwind  difference  scheme,  the  spectral  radii  are  bounded  by 

_ (v/T  +  27  +  VI  +26)2 _ 

2(2  +  7  +  <5)2  -  (VI  +  27+  VI  +26  )2  +  2>/(l  +  27)(1  +  2 6)  (1  -  cos(  irh)) ' 

THEOREM  3.  For  the  centered  difference  scheme,  if  [7]  <  1  and  |<5|  <  1.  then  the 
spectral  radii  of  the  two-line  block  Jacobi  iteration  matrices  for  the  reduced  system  are 
bounded  by 


(1  —  S2)  cos  2  rh  -I-  2>/(  1  —  72 )( 1  —  d- )  cos  rh 

3  -(y/l  -72  +  yT~-~)2  -  (1  -  72)  + 

2 7/ ( 1  —  72)(1  —  S2)  ( 1  —  cos  rrh)  +  2(  1  —  72)  ( 1  —  cos2  rrh) 


-r  o{h:). 
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For  the  upwind  difference  scheme,  the  spectral  radii  are  bounded  by 


(1  +  26) coe 2nh  +  2^(1  +  27)(1  +  26)  cos  irh 

2(2  +  7  +  6)2  —  ( \/l  +  27  +  vT+~26  )2  ~(l+27)  + 

2 \J (1  +  27)(1  +  26)  (1  —  cos 7r /i)  +2(1  +  27)(1  -  cos2  ir/i) 


Note  that  these  results  apply  for  both  the  natural  and  red-black  variants  of  the  line  or¬ 
derings.  The  bounds  of  Theorem  3  are  smaller  than  those  of  Theorem  2.  For  the  centered 
difference  schemes,  the  restrictions  on  7  and  6  coincide  with  the  conditions  guaranteeing 
that  the  discrete  solution  is  nonoscillatory.  Some  bounds  applicable  when  both  |7|  >  1 
and  |6|  >  1  can  be  found  in  [2],  [3]. 

For  all  orderings  considered,  the  reduced  matrices  S  have  block  Property  A,  so  that 
Young’s  analysis  of  relaxation  methods  applies.  In  particular,  let  C  =  L  +  U  where 
L  and  U  axe  strictly  lower  triangular  and  upper  triangular,  respectively,  and  let  = 
( D  —  u;£)-1[(l  —  ‘jj)D  +  <mU)  denote  the  block  SOR  iteration  matrix.  Then  p(C\)  =  p(B)2, 
and  for  all  of  the  cases  handled  by  Theorems  2  and  3,  the  choice 


1  +  Vi  -  KW 

minimizes  p{CJ)  with  respect  to  uj,  with  p( Cw» )  =  w*  —  1. 


4.  Numerical  experiments. 

In  this  section,  we  present  the  results  of  numerical  experiments  that  confirm  and 
supplement  the  convergence  analysis.  We  compare  the  bounds  on  spectral  radii  of  iteration 
matrices  with  computed  spectral  radii,  and  we  examine  the  performance  of  the  Gauss-Seidel 
and  SOR  methods  for  solving  the  reduced  system  arising  from  the  centered  difference 
discretization  of  the  convection-diffusion  equation.  All  computations  were  performed  on 
a  VAX-3600  in  double  precision  Fortran.  The  reduced  matrices  were  computed  using 
PCGPAK  [9].  All  spectral  radii  were  computed  using  the  QZ  algorithm  in  EISPACK  [5j. 

[6]-  ,  _  ^ 

Table  4.1  dhows  the  computed  spectral  radii  of  the  one- line  Gauss-Seidel  iteration 

matrices  for  h  =  1/32  and  various  choices  of  the  parameters  7  and  6.  Table  4.2  shows 
analogous  data  for  the  two-line  Gauss-Seidel  iteration  matrices.  For  the  one-line  ordering, 
the  results  under  the  heading  6  =  0  in  Table  4.1  are  identical  to  those  occurring  when 
7  =  0  and  6  has  the  values  in  the  first  column  of  the  table.  It  is  evident  from  the  tables 
that  the  analytic  bounds  are  very  close  to  the  computed  values  (except  for  the  case  of  large 
7  =  6,  where  the  analytic  bounds  come  from  [2],  [3]),  and  that  the  computed  spectral  radii 
are  considerably  smaller  than  one. 


Figs.  4.1  -  4.3  summarize  the  performance  of  the  block  iterative  methods  for  solving 
various  examples  of  the  discrete  convection-diffusion  equation  (1.4)  with  Dirichlet.  bound- 
an  ,.  nditions.  In  all  cases,  centered  differences  were  used  to  discretize  the  first  derivative 
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6 

=  0 

7  = 

6 

7 

Computed 

Bound 

Computed 

Bound 

.2 

.92 

.85 

.85 

.4 

.72 

.52 

.52 

.6 

.45 

.46 

.22 

.22 

.8 

.21 

.22 

.05 

.05 

1.0 

.051 

0 

0 

0 

1.2 

.04 

— 

.03 

.05 

1.4 

.06 

— 

.10 

.23 

1.6 

.08 

— 

.19 

.61 

1.8 

.11 

— 

.27 

1.25 

2.0 

.15 

— 

.35 

2.25 

Table  4.1:  Spectral  radii  and  bounds  for  the  one-line  Gauss-Seidel  iteration  matrices, 
centered  differences,  h  =  \/Z2. 
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Computed 

Bound 

.2 

.86 

.90 

.85 

.90 
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.81 

.4 

.63 

.66 

.62 

.65 

.42 

.44 

.6 

.38 

.40 

.34 

.36 

.16 

.16 

.8 

.18 

.19 

.12 

.12 

.03 

.03 

1.0 

-062 

.02 

0 

0 

0 

0 

1.2 

.04 

— 

.04 

— 

.02 

.03 

1.4 

.06 

— 

.09 

— 

.05 

.13 

1.6 

.07 

— 

.13 

— 

.09 

.34 

1.8 

.07 

— 

.18 

— 

.12 

.71 

2.0 

.07 

.22 

— 

.16 

1.27 

Table  4.2:  Spectral  radii  and  bounds  for  the  two-line  Gauss-Seidel  iteration  matrices, 
centered  differences,  h  =  1/32. 

terms,  and  the  mesh  size  was  h  =  1/32,  so  that  the  order  of  the  linear  system  was  N  =  961. 
The  curves  in  the  figures  represent  the  average  iteration  counts  for  three  test  problems,  de¬ 
termined  by  three  initial  guesses  with  random  values  in  the  interval  [—1,1],  In  all  cases,  the 
right  hand  side  s  was  identically  zero.  The  convergence  criterion  was  ||i“* H2 / 1| ?*o II2  <  10-6, 
where  r,  =  3  —  Su ^  =  —Su^  is  the  residual  at  the  I’th  iteration. 

The  left  side  of  each  of  these  figures  contains  results  for  the  one-line  orderings,  and 
the  right  side  contains  results  for  the  two-line  orderings.  Fig.  4.1  corresponds  to  the  case 
<5  =  0  (i.e.  only  the  uz  first  order  term  was  present  in  (1.4)),  Fig.  4.2  to  7  =  0  (only  t/y). 

1  ry 

We  believe  that  these  eigenvalue  computations  are  affected  by  ill-conditioning,  and  that,  this  is  why 
the  computed  spectral  radii  exceed  the  asymptotic  bounds. 


463 


and  Fig.  4.3  to  7  =  6  (u«  and  uf ).  The  results  are  for  the  block  Gauss-Seidel  method  with 
the  natural,  and  red-black  orderings.  In  addition,  results  for  the  block  SOR  method  with 
the  natural  ordering  are  shown  for  some  choices  of  7  and  6.  For  SOR,  we  used  the  optimal 
value  of  u>  determined  by  (3.1),  where  p(B)3  is  taken  from  the  computed  values  of  Tables 
4.1  -  4.2. 


Fig.  4.1:  Average  iteration  counts,  h  —  1/32,  6  =  0. 


Fig.  4.2:  Average  iteration  counts,  h  =  1/32,  7  =  0. 


We  make  the  following  observations  on  these  results.  In  most  cases,  the  Gauss-Seidel 
method  requires  thirty  or  fewer  iterations  to  reach  the  stopping  criterion.  The  best  results 
are  obtained  when  7  or  6  are  near  one,  and  performance  typically  improves  as  I7I  or 


Fig.  4.3:  Average  iteration  counts,  h  =  1/32,  7  =  6. 


|6|  — *  1.  For  all  values  of  7  and  6  tested,  the  self-adjoint  case  (7  =  6  =  0)  required  the 
largest  number  of  Gauss-Seidel  iterations.  In  these  cases,  for  which  the  results  are  not 
shown  on  the  figures,  the  stopping  criterion  was  typically  not  reached  after  150  iterations. 
The  best  results  for  large  7  or  6  are  for  the  two-line  orderings  with  6  =  0  (Table  44  and 
Fig.  4.1).  This  is  because  as  I7I  grows,  5  essentially  consists  of  its  block  diagonal  D  plus 
a  small  perturbation.  For  large  6  and  7  =  0,  a  vertical  two-line  splitting  would  give  better 
results  than  the  horizontal  splitting  used.  SOR  was  much  more  effective  than  Gauss-Seidel 
when  the  latter  was  slow.  We  examined  SOR  only  in  cases  where  the  spectrum  of  the  block 
Jacobi  iteration  matrix  is  real,  i.e.  where  either  I7I  <  1  and  |6|  <  1  or  (for  the  one-line 
ordering  [2])  (7)  >  1  and  |6|  >  1.  Thus,  (3.1)  applies.  In  variable  coefficient  problems  of  a 
similar  character,  it  would  be  realistic  to  use  an  adaptive  method  to  estimate  the  optimal 
value  of  uj  (see  e.g.  [11]). 
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ABSTRACT .  It  is  hypothesized  that  when  signal  energy  is  blocked  by  the 
thalamic  reticular  nucleus,  the  neocortex,  being  dynamically  unstable,  continues 
to  turn  on  codons.  Successive  codons  have  the  characteristic  that  they  share 
some  neurons.  A  computer  simulation  with  video  output  gives  us  a  grasp  of  the 
implications  of  this  hypothesis. 

STATEMENT  OF  THE  PROBLEM.  The  investigation  of  the  application  of  neural 
nets  to  machine  intelligence  involves  the  examination  of  patterns  of  activity. 
These  patterns  of  activity  are  most  difficult  to  comprehend.  Quantification 
does  not  lead  to  insight.  This  paper  describes  a  technique  that  allows  subjec¬ 
tive  decisions  about  the  effectiveness  of  a  given  configuration. 

BACKGROUND .  This  paper  had  its  genesis  in  a  series  of  programs  called 
GROWER  [1]  that  were  implemented  on  a  coarse-grained  parallel  computer  made  up 
of  40  transputers.  These  programs  investigated  the  mechanics  by  which  a  neocor- 
tical  net  could  be  interconnected  solely  by  the  effects  of  the  incoming  signal 
energy.  The  more  active  neurons  were  under  pressure  to  extend  their  axons,  and 
the  less  active  were  receptive  to  the  acquisition  of  synapses.  Following  a 
period  of  growth,  there  was  selective  stabilization  of  synapses  depending  on 
sensory  experience.  The  patterns  of  activity  were  compared  with  one  another  by 
computing  a  scalar,  the  cosine  between  the  patterns  when  each  was  characterized 
as  a  vector.  While  these  scalars  allow  one  to  see  whether  a  given  strategy  is 
successful,  there  is  a  feeling  that  one  would  like  to  get  a  more  direct 
intuition  of  the  resemblance  between  patterns. 

When  GROWER  is  running  and  we  give  our  attention  to  one  incut,  we  are  aware 
‘hat  a  uniaue  pattern  of  activity  results,  yet  to  qualify  the  relationship  oy 
eye  is  beyond  us.  Furthermore,  if  we  attend  to  a  given  Dattern  of  activity,  we 
can  not  say  how  much  or  in  what  way  it  is  related  to  other  patterns  we  have 
seen.  One  solution  is  to  use  recognizable  patterns  for  the  cortical  activity 
...  patterns  that  we  can  recognize  after  intervening  activity  ...  patterns  for 
which  we  can  form  a  subjective  opinion  about  their  correspondence  with  one 
another. 

Human  faces  make  up  such  a  set.  We  have  a  specific  mental  faculty  that 
allows  us  to  recognize  faces.  Prosopagnosia  is  the  loss  of  this  ability. 

Patients  can  exhibit  this  highly  specific  form  of  visual  agnosia  following 
injury  to  the  underside  of  the  occipital  lobe  extending  forward  to  the  inner 
side  of  the  temporal  lobes  of  either  or  both  cerebral  hemispheres  [2].  Such  an 
unfortunate  said,  "I  clearly  see  the  details  of  your  face,  your  mouth,  your 
nose,  but  it  is  like  a  blur  ....  I  am  no  longer  able  to  see  a  face  as  a  whole" 
[3].  The  peculiar  specificity  of  this  ability  led  Prof.  Kohonen  to  choose  faces 
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for  his  patterns  [4],  as  is  done  in  this  investigation.  Not  for  one  second  do 
we  think  that  such  little  faces  float  about  on  the  human  cortex.  We  use  them  so 
that  anyone  can  form  his  own  opinion,  at  a  glance,  about  how  one  pattern 
resembles  another.  "It  has  to  be  made  completely  clear  in  the  beginning  that 
this  is  not  an  attempt  to  model  the  visual  system  of  animals;  optical  images  are 
only  used  because  their  quality  can  easily  be  esteemed  visually"  [4,  p.  9], 

Thinking  is  the  subjective  aspect  of  a  freely-associating  neocortex.  There 
is  always  activity  in  the  living  cerebrum,  such  as  pulses  flowing  along  the 
axons.  If  the  thalamus  is  relaying  incoming  sensory  signals  to  the  cortex,  then 
the  activity  is  this  energy  coursing  through  the  cortex.  If  the  thalamus  blocks 
the  sensor  input,  then  the  cortical  activity  is  originally  traceable  to  the  last 
signal  input,  but  because  of  the  unstable  nature  of  the  cortex,  the  following 
activity  becomes  less  and  less  predictable,  but  still  logical  after  the  act. 

If  we  look  at  fluctuating  transmembrane  voltages,  then  all  the  neurons  are 
active  all  the  time.  If  we  look  at  discrete  events,  such  as  axonal  pulses, 
their  asynchronous  nature  is  mind-boggling.  However,  if  we  count  the  pulses 
during  a  time  period,  we  have  a  number,  and  we  can  say  that  some  neurons  are 
more  active  than  others  during  that  period.  Those  neurons  that  are  more  active 
make  up  a  codon.  Of  course  this  is  a  relative  condition;  still  it  does  exist, 
and  we  can  always  set  a  level  that  distinguishes  more  from  less.  A  codon  is  the 
material  aspect  of  what  the  mind  is  subjectively  aware  of  as  a  thought  or  a  men¬ 
tal  image.  Unfortunately,  if  the  cortex  were  exposed  and  the  activity  made 
visible,  we  could  not  perceive  the  relationship  of  the  activity  to  the  environ¬ 
ment.  It  would  appear  as  a  coruscating,  twinkling  of  a  myriad  of  lights, 
transposed  ueyond  any  human  insight.  This  is  exactly  the  problem  we  have  with 

GROWER  but  on  a  much,  much  larger  scale, 

• 

Kohoneri  quotes  Ari stole: 

"Mental  items  (ideas,  perceptions,  sensations,  or 
feelings)  are  connected  in  memory  under  the  following 
conditions: 

(1)  If  they  occur  simultaneously,  1  scat ' a 1  oontact':. 

(2)  If  they  occur  in  close  succession  ('temDoral  contact';. 

(3)  If  they  are  similar. 

(4)  If  they  are  contrary."  [4,  p.  3] 

Our  aim  is  to  show  how  simply  association,  and  therefore  thinking  can  be 
implemented.  We  use  only  (1)  and  (3).  The  extension  to  (2)  and  (4)  is  imme¬ 
diate.  The  life's  experience  of  our  simulated  cortex  consists  of  a  series  of 
faces.  All  the  pixels  of  one  face  are  presented  simultaneously.  This  is  how  we 
make  use  of  (1).  The  faces  have  areas,  large  and  small,  in  which  they  share  a 
pixel  configuration  with  one  or  more  other  faces.  In  this  connection  it  should 
be  noted  that  the  background  is  just  as  much  a  part  of  the  "face"  as  the  eyes 
and  nose.  If  two  faces  have  a  large  white  or  black  area  in  the  same  part  of  the 
background,  then  this  group  of  pixels  constitutes  a  "similarity"  in  the  sense  of 
(3)  . 
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It  is  held  that  our  imaginings  are  composed  soley  of  past  experiences,  but 
in  possible  novel  combinations.  We  can  imagine  an  animal  with  a  lion's  head,  a 
goat's  body,  and  a  snake's  tail.  We  have  experienced  all  these  separately,  but 
not  in  this  combination.  This  is  called  a  chimera.  On  the  other  hand,  it  is 
impossible  to  visualize  any  object  as  it  might  appear  to  an  organism  sensitive 
to  ultraviolet  radiation.  The  congenitally  blind  can  not  visualize?  they  can, 
however,  conceive  a  spatial  extension  of  the  ability  to  touch.  So  we  hope  that 
we  will  find  our  machine  cortex  displaying  a  chimera. 

When  all  the  neurons  in  a  column  are  off,  and  some  or  all  the  neighboring 
columns  contain  a  firing  neuron,  then  the  most  likely  neuron  to  turn  on  is  the 
one  that  shares  a  history  with  most  of  these  firing  neurons.  This  means  that  if 
most  of  the  locally  firing  neurons  share  a  "face,”  then  this  face  will  most 
likely  be  continued  in  the  activated  column.  However,  if  this  majority  is 
shared  by  two  or  more  "faces,"  then  it  is  not  at  all  clear  what  will  ensue.  It 
is  our  hypothesis  that  this  is  exactly  what  happens  in  the  neocortex  when  we  say 
we  are  thinking.  The  program  is  a  successful  simulation  in  the  sense  that  chi¬ 
meras  do  occur. 

This  completes  the  logic  of  the  simulation.  At  any  time,  we  could  simulate 
the  thalamus  passing  through  new  sensory  input  or  chopping  it  off. 

At  this  point,  it  is  traditional  to  mathematically  demonstrate  the  relative 
likelihood  of  various  scenarios.  This  is  exactly  where  we  suggest  that  investi¬ 
gations  of  intelligent  machines  have  gone  astray  in  the  past.  A  mathematical 
demonstration  is  substituted  for  laboratory  experience  with  such  machines,  and 
as  a  result  we  have  statements  such  as  "We  also  must  study  the  brain  at  a 
theoretical  level  that  investigates  the  computations  that  are  necessary  to  per¬ 
form  its  functions"  [5]. 

The  brain  does  not  compute.  The  incoming  signal  energy  flows  through  the 
brain,  but  there  is  no  computation — and  no  need  of  computation. 

APPROACH  TO  THE  PROBLEM.  We  implemented  a  computer  program  called  THNKER 
that  models  a  selection  of  the  necortex  as  a  64-  by  64-array  of  columns.  These 
are  neural  columns,  not  matrix  columns.  Each  column  contains  eight  orinc-ioa: 
neurons  that  are  mutually  inhibitory  so  that  they  form  a  competitive  group  w-th 
one  being  dominant  during  a  jiffy.  Brain  jiffies  come  eight  to  ten  a  second. 

THNKER  consists  of  a  main  module  and  14  subroutines  written  in  Fortran,  and 
organized  around  named  common  statements.  It  is  best  explained  by  analyzing  the 
common  statements :  . 

L0GICAL*1  CORTEX , ON , SMALON , SAVE , EOF 
COMMON/SLAB/  CORTEX (8,7,7,8,64,64) 

COMMON/INPUT/  IPIX,KPIX(64, 64) ,EOF 
COMMON/WRKNG/  CONINC , CONDEC , CONCHK , CONDX (8,64,64), 

1  ON (8 , 64 , 64 ) 

COMMON/FRAME/  SMALON (64,64), K3GCNT (8,8), BGCNDX (8,8), 

1  KACTIV (2,3) 

COMMON/OUTPUT/  SAVE (64, 64; 
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Enough  programming  maturity  is  assumed  in  the  reader  so  that  he  knows  that 
LOGICAL  and  CHARACTER  bytes  can  be  manipulated  as  eight-bit  integers. 

Sensory  input  is  provided  by  a  TV  camera  via  a  frame  grabber  with  256  by 
256  pixels  of  256  gray  levels.  To  match  the  computational  space  available  (16 
Mbytes),  this  is  reduced  to  64  by  64  pixels  of  gray  levels.  At  this  resolution, 
faces  can  be  recognized.  NEXT  furnishes  these  frames,  one  at  a  time,  by  placing 
the  pixels  in  KPIX  and  incrementing  IPIX.  EOF  is  returned  as  false  as  long  as 
input  frames  are  available. 

The  primary  visual  cortex  (area  17)  is  simulated  as  64  by  64  columns 
(barrels)  containing  32,768  neurons.  A  column  contains  eight  'computer' 
neurons,  each  of  which  represents  a  group  of  biological  neurons.  A  group  has  a 
gray  level  as  its  output.  Each  neuron  has  an  efferent  inhibitory  synapse  on  the 
other  seven  neurons  in  its  column  and  efferent  excitatory  synapses  on  the  384 
neurons  closest  to  it,  except  the  neurons  in  its  own  column.  The  array, 
CORTEX(I, J,K,L,M,N) ,  contains  (implicitly  and  explicitly)  all  the  necessary 
working  information.  M  and  N  locate  the  efferent  column,  and  L  the  efferent 
neuron  within  the  column.  J  and  K  identify  the  afferent  column,  and  I  the 
afferent  neuron.  The  byte  pointed  to  contains  the  potentiation  of  the  corre¬ 
sponding  synapse,  and  thus  the  facilitation  may  have  256  values.  A  zero  value 
implicitly  represents  a  missing  (discarded  or  never  formed)  synapse.  The  inhib¬ 
itory  synapses  are  left  implicit,  as  the  output  is  based  on  the  winner-take-all 
scheme. 

During  the  experiential  phase  (the  life  experience  of  the  simulated  organ¬ 
ism),  faces  are  presented  to  the  cortex  by  PRESNT .  A  face  causes  one  neuron  in 
each  column  to  be  excited  according  to  the  gray  level  of  the  corresponding 
pixel.  The  4096  neurons  that  are  excited  have  any  mutual  synapses  potentiated. 
This  is  called  Hebbian  learning  after  a  hypothesis  of  A.O.  Hebb.  We  see  that 
there  is  a  limit  of  255  faces  that  could  be  experienced  before  there  is  a  possi¬ 
bility  that  some  synapse  has  reached  its  limit  of  potentiation.  This  is  not 
unreal;  we  should  recall  the  trouble  we  have  with  twins.  In  real  practice,  many 
thousands  of  faces  could  be  distinguished,  but  the  reality  of  time  available 
limited  us  to  at  most  a  couple  of  dozen.  The  only  effect  on  the  organism  of 
*:his  life  exoerience  is  the  ootentiated  synapses.  There  are  no  representations, 
no  comoutations  that  the  brain  'must  oerform,’  and  no  need  of  them,  we  hold 
this  potentiation  to  be  equivalent  to  all  that  occurs  during  experience  in  the 
mammalian  brain. 

Now  for  free-running  association.  The  signal  energy  is  cut  off.  The 
activity  of  the  cortex  drops.  The  inactive  neurons  replenish  their  molecules. 

As  the  activity  drops  almost  to  nothing,  those  fully-replenished  neurons  in  the 
vicinity  of  the  last  active  neurons  are  triggered  by  the  afferents  they  have 
from  this  dying  activity.  They,  in  turn,  excite  neurons  in  their  vicinity.  For 
easy  viewing,  we  arrange  things  so  that  this  activity  spreads  from  a  point 
(wavelike).  This  is  an  artifice,  solely  for  academic  reasons.  If  the  pattern 
came  on  in  dispersed  points,  as  it  would  when  the  corpus  callosum  is  involved, 
it  would  be  impossible  to  "see"  in  the  same  way  as  random  patterns  can  not  be 
"seen. " 

Following  the  experiential  phase,  there  is  a  period  of  random  association. 
This  follows  the  hypothesis  that  in  the  mammalian  brain  the  thalamus 
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periodically  (8  to  10  times  a  second)  interrupts  the  flow  of  sensory  input  to 
the  cortex,  and  during  this  period  the  cortex  associates  freely.  If  motor  out¬ 
put  does  not  ensue,  this  period  is  extended  and  the  process  called  thinking 
arises. 

In  DEPLET  a  concept  of  "molecular  depletion”  is  introduced.  We  speculate 
that  neurons  are  always  more  or  less  active.  A  period  of  greater  activity  "uses 
up"  certain  molecules  faster  than  they  can  be  replenished.  After  such  a  period, 
a  neuron  becomes  refractory  and  enters  a  period  of  lesser  activity.  As  long  as 
signal  energy  is  coming  in,  a  neuron  can  be  exercised  to  the  point  of  exhaus¬ 
tion.  But  when  the  signal  energy  is  interdicted  by  the  thalamus,  the  internal 
condition  of  a  cortical  neuron  becomes  more  important,  and  it  is  not  likely  to 
fire  if  the  store  of  needed  molecules  is  mostly  depleted. 

The  condition  of  each  neuron  is  kept  in  C0NDX(I , J,K) .  Initially  it  is  set 
to  1.0.  During  each  cycle,  DEPLET  is  called  and  C0NDX(I,J,K)  adjusted.  The 
status  of  the  neuron  is  kept  in  0N(I,J,K).  This  logical  variable  is  set  to  true 
if  the  neuron  is  on,  and  false  otherwise.  If  the  neuron  is  on,  then  CONDX  is 
decremented  by  CONDEC.  If  CONDX  falls  to  zero,  then  the  neuron  is  turned  off. 

If  the  neuron  is  off  at  entrance  to  DEPLET,  CONDX  is  incremented  by  CONINC.  The 
end  of  the  refractory  period  is  signaled  by  the  rise  of  CONDX  above  a  check 
point,  CONCHK.  Satisfactory  values  are 

C0NDEC=0 . OS 
C0NINC=0.1 
C0NCHK=0 . 4 

If  a  neuron  is  turned  off,  a  special  check  is  made  to  see  if  this  neuron  is 
the  center  of  an  association.  If  it  is,  its  indices  are  removed  from  KACTIV. 

The  effects  of  this  action  are  noted  under  'major  branch'  below.  Next,  a  call 
is  made  to  ANLSYS.  The  cortex  is  divided  into  8  by  8  subregions,  each  con¬ 
taining  8  by  8  columns,  or  512  neurons.  ANLSYS  analyzes  the  condition  of  this 
subdivided  cortex.  If  at  least  one  neuron  in  a  column  is  on,  the  corresponding 
SMAL0N(J,K)  is  set  true,  otherwise  false.  Because  of  the  mutual  inhibition 
involved  in  a  column  and  the  winner-take-all  approach,  the  specifications  of  the 
s-'mulation  are  that  no  more  than  one  neuron  in  a  column  can  he  on  at  any  given 
time.  A  tally  of  the  number  of  neurons  in  a  subregion  that  are  on  is  placed  in 
KBGCNT ( JA, KA) .  A  summation  of  the  condition  of  all  the  neurons  in  the  subregion 
(on  or  off)  is  placed  in  BGCNOX ( JA , KA ) .  If  8GCNDX  should  be  divided  by  KBGCNT, 
it  would  give  the  average  condition  of  the  neurons  in  that  subregion. 

At  this  point,  a  major  branch  in  the  association  cycle  occurs— the  status 
of  KACTIV (1,1)  is  checked.  This  array  contains  the  indices  of  neurons  that  are 
the  center  of  active  associations.  Initially  the  array  is  set  to  zero.  It  can 
become  nonzero  in  FRAMER  and  be  reset  to  zero  in  DEPLET.  If  KACTIV (1,1)  is 
zero,  the  program  branches  to  FRAMER.  If  it  is  nonzero,  the  branch  is  to  THINK. 
We  start  with  the  branch  to  FRAMER. 

It  is  hypothesized  that  the  cortex  maintains  a  level  of  activity  that  fluc¬ 
tuates  within  limits  (unless  in  a  pathological  state).  The  governance  of  this 
condition  is  unknown,  but  biologically  reasonable.  We  simulated  this  governance 
by  monitoring  the  activity  of  our  cortex.  It  is  divided  into  64  regions. 
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When  the  activity  in  each  region  is  found  to  be  at  or  below  20  active  neurons,  a 
new  wave  of  activity  is  started  by  choosing  the  region  that  showed  the  most 
activity.  If  there  is  a  tie,  a  random  choice  is  made. 

FRAMER  checks  KBGCNT .  If  at  least  one  subregion  has  more  than  20  neurons 
on  in  its  64  columns,  a  return  is  made.  Otherwise,  the  subregion  that  has  the 
most  active  neurons  is  selected.  If  no  subregion  has  a  single  neuron  on,  then 
one  column  of  the  entire  cortex  is  selected  at  random.  A  random  neuron  of  this 
column  is  turned  on,  and  SMALON  of  the  column  made  true.  Otherwise,  a  nonactive 
column  in  the  identified  subregion  is  selected  at  random.  In  either  case  the 
indices  of  the  selected  column  are  placed  in  KACTIV.  At  this  point,  if  FRAMER 
has  selected  a  nonactive  column,  a  call  is  made  to  ASSOC. 

ASSOC  analyzes  the  selected  column.  The  excitation  afferent  on  each  neuron 
of  the  column  with  CONDX  greater  than,  or  equal  to,  CONDEC  is  summed.  This  is 
taken  to  be  the  product  of  CORTEX  and  CONDX  of  the  afferent  neuron.  The  neuron 
that  has  the  greatest  excitation  is  turned  on  (ON  is  made  true),  and  SMALON  of 
the  column  is  made  true.  If  no  neuron  meets  these  tests,  a  return  is  made  to 
FRAMER  and  then  to  CYCLE.  Eventually  a  neuron  is  selected  in  ASSOC  or  randomly 
chosen  in  FRAMER. 

Now  we  see  that  the  next  time  we  come  to  the  major  branch,  KACTIV (1,1)  is 
nonzero,  and  the  path  to  THINK  is  taken.  In  THINK,  the  column  recorded  in 
KACTIV  is  taken  as  a  center  and  a  radius  is  set  to  one.  A  call  is  made  to  ASSOC 
for  each  column  that  is  within  this  radius.  This  process  is  continued  with  the 
radius  incremented  by  one  until  ASSOC  has  turned  on  64  neurons.  This  arbitrary 
number  was  chosen  because  of  viewer  requirements  in  the  final  output. 
Alternatively,  a  return  is  made  if  the  entire  cortex  has  been  swept  with  less 
than  64  neurons  turned  on. 

This  completes  the  logic  of  the  association  cycle.  A  TV  frame  is  saved  by 
recording  a  pixel  for  each  column  with  the  gray  scale  value  of  the  active  neuron 
(if  there  is  one).  Thirty  frames  are  required  for  each  second  of  the  final 
videotape.  The  program  is  run  long  enough  to  give  enough  frames  to  make  an 
acceptable  viewing  period. 

'fie  should  note  that  the  action  of  various  minor  routines  that  prov-ue 

housekeeping  functions  have  been  skipped  over  as  extraneous  to  the  logical  flow 
of  the  simulation. 

RESULTS.  THNKER  was  run  many  times  with  various  selections  of  faces.  The 
output  was  viewed  on  a  video  monitor.  During  the  association  phase,  portions  of 
various  faces  could  be  seen.  Blending  of  the  features  on  one  face  with  those  of 
another  could  be  seen.  The  consensus  of  viewers  is  that  the  action  of  the  net¬ 
work  can  be  followed  subjectively.  The  hoped-for  chimeras  appeared.  The 
linking  of  one  activity  to  another  could  sometimes  be  seen.  A  videotape  of  a 
typical  run  was  made  and  presented  at  the  conference. 

CONCLUSIONS.  This  technique  is  an  excellent  way  to  judge  the  performance 
of  a  complicated  neural  net.  The  ability  of  the  human  eye  and  brain  to  recog¬ 
nize  a  portion  of  a  face,  even  if  presented  fleetingly,  goes  far  beyond  any 
practical  numerical  description  of  a  pattern. 
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The  faces  presented  constitute  the  life  history  of  the  simulated  organism. 
The  extension  of  time-dependent  sequences  and  moving  objects  is  sraightforward, 
but  computationally  expensive. 
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On  topological  complexity  of  solving  polynomial  equations  of  special  type. 

A.Libgober  * 

Department  of  Mathematics 
University  of  Illinois  at  Chicago 
P.O.B.  4348,  Chicago,  Ill,  60680 


Abstract.  The  notion  of  Smale’s  topological  complexity  is  reviewed.  Topological  and 
algebro-geometrical  problems  arising  from  finding  topological  complexity  for  solving  polyno¬ 
mial  equations  with  several  vanishing  coefficients  formulated.  Partial  results  toward  their 
solutions  are  stated  with  an  outline  of  proofs. 


In  [S]  S.Smale  introduced  the  notion  of  topological  complexity  of  an  algorithm  which 
provides  an  information  on  the  structure  of  possible  algorithms  for  solving  a  given  problem 
rather  then  on  their  implementation  time.  Roughly  speaking  one  assumes  that  the  compu¬ 
tation  tree  consists  of  nodes  and  connecting  edges  and  that  the  nodes  axe  either  input  nodes 
(having  no  incoming  edges),  or  computation  nodes  (having  one  incoming  and  one  outcoming 
edge),  or  branching  nodes(having  one  incoming  and  two  outcoming  edges)  or  leaves  (halts 
with  no  outcoming  edges).  The  topological  complexity  of  an  algorithm  is  the  number  of 
branching  nodes  in  its  computation  tree  (or  the  number  of  leaves  minus  one). 

In  the  same  work  S.Smale  shows  how  the  low  bound  for  the  topological  complexity  can 
be  reduced  to  purely  topological  problems.  For  an  algorithm  for  finding  with  accuracy  t 
the  roots  of  a  polynomial  from  a  family  of  polynomials  F  one  can  state  that  the  topological 
complexity  is  greater  or  equal  than  the  Schwartz  genus  of  the  covering  map  which  relates  to 
an  ordered  collection  of  roots  of  a  polynomial  from  F  without  multiple  roots  the  collection 
of  its  coefficients.  Here  by  the  Schwartz  genus  of  a  map  f  :  X  —*  Y  one  means  the  minimal 
number  k  such  that  Y  affords  a  cover  with  k  open  sets  (7i,...{7jt,  ( Y  —  U*_ xUi.  such  that 
/  has  a  section  over  each  Ux,  i.e.  for  each  i  there  exist  a  continuous  map  <7,  :  Ux  —  X  such 
that  /  c  gi  =  id). 

The  Schwartz  genus  can  be  estimated  from  below  as  the  maximal  length  of  a  non  zero  cup 
product  of  elements  in  Ker{Hl{Y,  Zi))  —  H'{X,  Zi)).  One  can  use  here  twisted  coefficients 
instead  Z2  (cf.(Schj).  Using  this  method  S.  Smale  ([S])  obtained  (log2n)2/3  as  the  lower 
bound  for  the  topological  complexity  for  finding  with  accuracy  e  the  roots  of  the  polynomial 
equation  with  cne  unknown.  On  the  other  hand  in  the  case  when  Y  is  a  quotient  of  X  by 
a  free  action  of  a  discrete  group  G  one  can  use  the  homological  genus  of  any  G- module  .4 
as  a  lower  bound  for  the  Schwartz  genus  of  the  quotient  map.  The  .4- homological  genus 
of  a  principal  G-bundle  f  :  X  —*  Y  with  the  fibre  a  discrete  group  G  with  corresponding 
classifying  map  c  :  Y  — *  K(G,  1)( K(G,  1)  is  the  Eilenberg  MacLane  space  of  the  group  G) 
is  the  minimal  integer  i  such  that  the  canonical  map  H*(K(G,l),A))  —»  H3(Y,c'{A))  is 
trivial  for  j  >  i  ([Sch]).  Using  this  V.Vasiljev  [V]  obtained  as  a  lower  bound  for  the  Smale’s 
problem  n  —  minp{Dp (n))  where  is  the  sum  of  the  digits  in  p-adic  expansion  of  n  and 

the  minimum  is  taken  over  all  primes  p.  He  used  as  .4  the  group  of  integers  Z  with  the 
action  of  the  symmetric  group  corresponding  to  the  sign  representation. 

*  Supported  by  NSF  and  U.S.  Army  grants. 
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It  seems  it  would  be  interesting  to  estimate  the  topological  complexity  of  the  solving 
some  special  classes  of  polynomial  equations,  for  example  polynomial  equations  with  several 
vanishing  coefficients,  or  answer  similar  questions  for  systems  of  polynomial  equations  (the 
latter  was  addressed  in  [L]).  The  application  of  the  Smale’s  theory  requires  rather  detailed 
information  on  the  topology  of  the.  complements  to  discriminants  in  the  space  of  special 
types  of  polynomials  which  seems  is  not  available  at  the  moment.  This  is  the  problem  which 
we  begin  to  address  here.  Specifically  the  following  should  be  answered. 

Problem  1.  What  is  the  fundamental  group  of  the  space  of  polynomials  with  several 
vanishing  coefficients?  Do  the  cohomology  of  this  space  depend  only  on  this  fundamental 
group?  i.e.  is  the  space  of  polynomials  with  vanishing  coefficients  is  the  Eilenberg  MacLane 
space. 

Problem  2.  What  are  the  cohomology  with  various  (twisted)  coefficients  of  the  space  of 
polynomials  with  several  vanishing  coefficients?  What  is  their  relationship  with  the  coho¬ 
mology  of  symmetric  group? 

If  one  considers  the  space  of  all  monic  polynomials  then  the  answer  to  problem  1  goes 
back  to  E.Artin  ([A])  and  Fadell  and  Neuwirth  [FN]:  the  fundamental  group  of  the  space  of 
monic  polynomials  without  multiple  roots  is  the  braid  group  Bn  on  n  strings  and  this  space 
is  the  Eilenberg  MacLane  space  of  Bn ■  The  cohomology  of  the  symmetric  group  surjects  on 
the  cohomology  of  the  braid  group  in  the  case  of  cohomology  with  Zi  coefficients  ( [Sj )  or 
coefficients  in  sign  representation  of  symmetric  group  ([V]). 

Here  we  shall  only  indicate  a  solution  for  trinomials.  First  note  that  in  the  case  of 
polynomials  with  several  vanishing  coefficients  of  the  form 

xn  +  ctiji*1  +  Ot2x,s ...  +  ait  (1) 

the  discriminant  hypersurface  is  rather  different  than  the  discriminant  of  the  space  of  all 
monic  polynomials  of  degree  n:  in  may  become  reducible  and  have  different  than  in  generic 
case  degree  (when  the  degree  is  2 n  —  2). 

Examples  of  discriminants-. 


1 )  For 


the  discriminant  is 


r3  -  ax'  -bx  - 


— 27a4fr  +  2250a2hc2  -  1600ah3c  +  3125c4  +  2566°  -  108ca= 


2)  For 


x6  -*■  ax3  —  bx 


the  discriminant  is 

2700063ac2  -  135063ca3  -  108as63  +  3  125  6s  +  34992a2c4  -  87483  -  729c2 a6  -  46656c5 


3)  For 


6  3  l  2 

x  —  ax  —  bx  —  c 


the  discriminant  is 


c9  -  10246s  -  1382463c2  -  I08a463  -  46656c4  +  729a6c  +  34992a2c3  -  -8748a4  c2  -  8640a263c 
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More  generally  one  has  the  following: 

Theorem  A.  The  discriminant  of  the  family  of  polynomials  of  the  form  (1)  has  at  most 
two  irreducible  components.  The  number  of  irreducible  components  is  two  if  and  only  if 
tfc_i  ^  1  and  in  thts  case  one  of  components  is  the  linear  subspace  a ik  =0.  The  degree  of 
the  discriminant  is  n  +  i\  —  i/,-1- 

(The  first  part  of  this  theorem  is  obtained  in  [FS]).In  the  case  of  trinomials 

xn  +  axk  +  b  (2) 

one  can  give  complete  answer  to  the  problem  1  above. 

Theorem  B.  The  fundamental  group  of  the  space  of  polynomials  of  form  (2)  with  no 
multiple  roots  is  the  group  of  an  algebraic  link  of  the  type  explicitly  determined  by  n  and  k.  In 
particular  ifk  —  \  then  the  fundamental  group  of  the  space  of  polynomials  of  form  (2)  without 
multiple  roots  is  the  group  of  the  torus  knot  of  type  (n,n  —  1)  t.e.  admits  a  presentation  with 
two  generators  g\,gz  and  one  relator  g*  =  g^  -1.  This  space  is  the  Eilenberg  MacLane  space 
for  any  n  and  k. 

Remark:  For  k  =  1  by  virtue  of  having  so  simple  presentation  for  the  fundamental 
group  one  can  easily  describe  the  homomorphism  of  it  into  the  braid  group  induced  by 
embedding  space  of  polynomials  of  form  (2)  into  the  space  of  all  polynomials  of  degree 
n.  If  are  the  standard  generators  of  J3„  then  this  homomorphism  is  given  by 

g\  — *  si...sn_i,92  — *  ai...sn_isi.  In  particular  this  map  is  surjective.  This  in  turn  implies 
that  the  Galois  group  of  generic  trinomial  equation  in  characteristic  zero  is  the  full  symmetric 
group,  (cf.  [Sm]  with  much  milder  restrictions  on  characteristic  of  the  ground  field).  This 
argument  can  be  carried  out  in  the  case  k  >  1  as  well. 

Sketch  of  the  proof  First  notice  that  the  equation  of  the  reduced  discriminant  of  the 
polynomial  (2)  is 

b((-l\n-k-'kkln-k)nan  -nn6'n'^)  =  0  -3'. 

if  k  >  1  (cf.  [S]).  This  follows  from  the  fact  that  a  polynomial  has  multiple  root  if  and 
only  if  it  and  its  derivative  have  common  root.  One  cam  eliminate  x  from  xn  4-  axk  +  b  = 
0,  nxn_1  +  kaxk~1  =  0  by  replacing  last  equation  by  xn~k  =  -ka/n  (this  is  possible  assuming 
i  0  which  is  the  case  provided  6^0.  6  =  0  clearly  belongs  to  support  of  discriminant 
if  and  only  if  k  >  1  which  accounts  for  the  first  factor  in  (3)),  substituting  this  in  the  first 
equation  and  replacing  it  by  expression  for  x  <  k  in  terms  of  a  and  b  after  which  elimination 
of  x  gives  the  second  factor  in  (3).  Now  the  complex  curve  D  defined  by  (3)  is  invariant  under 
C*  action  on  C2  which  implies  that  the  complement  to  D  in  C 2  is  equivalent  to  complement 
in  3-sphere  to  the  link  of  the  only  singularity  of  the  curve  D  namely  the  singularity  at  the 
origin.  The  Milnor  fibration  of  the  link  of  singularity  of  D  exhibits  the  complement  to  the 
link  of  the  singularity  of  D  as  a  fibration  over  the  circle  with  the  real  punctured  surface  as  a 
fibre  which  implies  that  the  complement  to  the  curve  D  is  the  Eilenberg-MacLane  space.  In 
the  case  k  =  1  the  equation  of  the  discriminant  is  given  by  the  vanishing  of  the  second  factor 
in  (3).  This  equation  after  change  of  variables  looks  like  un  =  vn_1.  The  link  of  singularity  of 
this  curve  is  the  torus  knot  of  type  (n,  n  —  1)  and  the  description  of  the  fundamental  group  of 
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the  torus  knot  cited  above  is  the  well  known  one.  The  details  of  the  proof  of  both  theorems 
above  and  the  cohomology  calculations  involved  in  the  problem  2  will  appear  elsewhere. 
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ABSTRACT 

A  general  method  is  presented  for  symbolically  uncoupling  a  special  class  of  augmented 
linear  equations  with  tree  topology  defined  by  sparse  nonsingular  incidence  or  connectivity 
matrices.  These  equations,  expressed  in  terms  of  excess  and  shared  generalized  state  variables, 
are  characteristic  of  p-element  open  loop  systems.  This  paper  presents  an  algorithm  based  on 
optimal  block  matrix  permutation  and  factorization  to  precisely  follow  system  topology  and 
recursively  generate  a  symbolic  set  of  fully  uncoupled  equations  yielding  all  variables  in  order 
P  (O(p))  operations.  However,  the  operations  coefficient  can  be  relatively  large  for  many 
problems,  and  recursion  may  inhibit  full  exploitation  of  vector  and  parallel  processors.  Thus 
an  equivalent,  compact  and  highly  coupled  set  of  generalized  equations  is  obtained  by 
eliminating  the  excess  variables.  The  generalized  matrix  of  this  set  of  equations  is  symbolically 
manipulated  into  its  natural  factors  using  the  previous  recursive  algorithm  to  get  a  new  O(p)  to 
0(p2)  solution.  This  algorithm  has  a  much  smaller  operations  coefficient  and  can  more 
effectively  exploit  vector  and  parallel  processing.  Iterative  refinement  is  also  added  to  avoid 
many  of  the  recursive  decomposition  steps  required  at  each  function  evaluation.  This  allows 
even  greater  exploitation  of  vector  and  parallel  processors.  The  algorithms  are  also  modified  to 
allow  any  number  of  the  generalized  states  to  be  specified  and  to  account  for  any  degree  of 
singularity  or  redundancy  in  the  system  equations. 

1 .  INTRODUCTION 

The  increase  in  digital  computer  capacity  and  the  development  of  advanced  numerical 
methods  has  stimulated  the  desire  to  model  and  analyze  large  scale  systems.  When  the 
equations  must  be  solved  thousands  of  times,  direct  numerical  methods  are  unsuitable  because 
of  the  excessive  computer  processing  required  to  manipulate  the  resulting  matrices.  The 
extensive  computational  overhead  and  limited  computer  speed  has  prompted  new  searches  for 
more  efficient  algorithms. 

In  general,  formulations  which  incorporate  the  maximum  number  of  variables  yield  the 
largest,  least  coupled  augmented  equation  systems.  Open-loop  or  tree-structured  equations  of 
this  type  can  be  solved  recursively  in  O(p)  operations  (in  many  cases  the  minimum  possible) 
with  careful  algorithm  implementation  [1-3].  However,  the  constant  in  front  of  O(p)  can  be 
relatively  large,  making  recursion  less  effective  than  direct  decomposition  as  the  degree  of 
system  parallelism  increases.  A  combined  algorithm  exploiting  the  sparsity  of  highly 
uncoupled  augmented  equations,  compactness  of  generalized  equations,  iterative  refinement, 
and  vector  and  parallel  processing  can  offer  substantial  computational  advantages  for  many 
applications. 

This  paper  presents  a  brief  overview  of  a  method  for  symbolically  representing  system 
topology  by  two  sparse  connectivity  matrices.  It  is  shown  how  these  matrices  loosely  couple 
the  augmented  system  equations  and  how  they  can  be  used  to  direct  the  recursive  elimination 
and  back  substitution  process.  The  connectivity  matrix  inverses  can  be  used  to  transform  these 
augmented  equations  into  a  maximally  coupled  generalized  set  of  equations.  A  recursive 
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algorithm  can  then  be  employed  to  generate  symbolic  nonsingular  natural  factors  of  the 
resulting  generalized  coefficient  matrix.  The  algorithms  are  also  modified  so  any  number  of  the 
generalized  states  can  be  routinely  specified,  and  singularities  and  redundant  equations  can  be 
handled.  An  iterative  refinement  algorithm  is  combined  with  the  natural  factors  to  exploit 
vector  and  parallel  processors,  yielding  even  more  efficient  solutions  for  many  applications. 

2 .  THE  RECURSIVE  ALGORITHM 

Consider  the  following  loosely  coupled  augmented  equations 


An 

•  C22  0 

"Xi  ' 

'b*' 

C11 

0  -  Hi3 

x2 

- 

b, 

0 

H«  0 

_X3j 

b« 

or  the  equivalent  highly  coupled  generalized  equations 


H«  R22  An  R11  H13  Xj  ■  b*  +  H«  Rafbj  -  An  R11  bi]  (2) 

where 

R11  -  c;',  (3) 

Raz  -  Ca  (4) 

x,  -  R11  [bi  +  Hi3  xs]  (5) 

and 

Xj  « -  R22[t»2  -  An  Xi]  (£) 


(Throughout  this  paper  the  symbols  0  and  l  will  be  used  to  indicate  respective  zero  and  identity 
matrices  whose  dimensions  are  implied  by  the  accompanying  matrices  and  vectors.)  Equation 
2  may  be  obtained  directly  from  Eq.  1  by  eliminating  the  excess  vectors  x,  and  x2  using  Eqs.  5 
and  6.  Both  Eqs.  1  and  2  represent  the  same  system  in  terms  of  generalized  state  vector  x3. 
The  remaining  vectors  xi,  x2,  bi,  b2  and  b«  evolve  according  to  the  basic  system  definitions. 
Subscripts  1  and  2  associate  vectors  and  matrices  with  dual  spaces  where  the  dimensions  may 
be  different  Vectors  x3  and  b4  are  dual  subspaces  of  the  respective  spaces  1  and  2.  The 
dimensions  of  spaces  1  and  2  are  generally  equal,  as  are  their  respective  subspaces  3  and  4. 
Matrix  a2,  may  be  symmetric  and  semidefinite,  positive  semidefinite  or  even  positive  definite. 

If  this  is  true  and  Ca  -  c[,  and  -  h,t3,  then  the  overall  generalized  coefficient  matrix  in  Eq.  2 
will  be  symmetric  and  will  have  one  of  the  above  properties. 

The  big  challenge  is  to  represent  the  governing  equations  for  coupled  systems  in  the 
augmented  form  of  Eq.  1  or  factored  generalized  form  of  Eq.  2.  Intuitively  one  strives  to 
formulate  equations  in  terms  of  the  minimum  possible  number  of  variables.  This  approach 
results  in  equations  similar  to  Eq.  2,  but  unfortunately  in  the  form  Ax-b  where  the  internal 
structure  of  Eqs.  1  or  2  have  been  lost.  Thus  it  is  important  to  change  one’s  viewpoint  of  the 
problem  and  first  represent  individual  components  of  coupled  systems  as  separate  entities  in 
terms  of  the  maximum  number  of  state  variables.  These  excess  variables  are  obviously  not 
required  for  successful  formulation  of  the  problem  as  indicated  by  the  generalized  equations, 
however,  they  are  essential  for  identification  and  formulation  of  the  augmented  equations. 
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With  these  thoughts  in  mind,  it  may  be  possible  to  convert  many  different  formulations  into 
augmented  form  to  take  advantage  of  the  algorithms  presented  in  this  paper. 

Let  the  topology  of  a  p-element  system  be  defined  by  two  sparse  p  by  p  matrices  c,  and 
C2  containing  only  ±l's  and  0’s  (see  Fig.  1  and  Eqs.  B8  and  B9  of  Appendix  B).  Matrix  C, 
defines  a  tree  representing  forward  communication  or  coupling  from  parent  element  to  child 
and  matrix  Cz  defines  a  tree  representing  backward  communication  or  coupling  from  children  to 
parent  elements.  By  special  orientation  of  the  communicating  element  interfaces  (all  oriented 
positively  outward  from  the  tree  roots)  and  element  naming  convention  (all  named 
consecutively  outward  from  the  roots),  row  *  of  c,  (*  =  a,  b,  ...  p)  specifies  that  child  * 
receives  communication  from  parent  *-l.  Since  child  *  always  appears  after  parent  *-l  in  the 
naming  sequence,  this  convention  causes  Ci  to  be  lower  triangular  with  a  unity  determinant. 
Any  row  of  C,  corresponding  to  a  child  not  influenced  by  a  parent  will  have  a  single  1  at  its 
diagonal.  In  reality,  this  element  forms  the  root  of  a  new  tree  in  Ci.  The  remaining  rows  will 
have  exacdy  one  1  at  the  diagonal  and  one  -1  to  the  left  of  the  diagonal. 


Figure  1.  Forward  and  backward  communicating  trees  of  a  six-element  system 

Similar  to  matrix  C,,  row  *  of  Cz  (*  =  a,  b,  ...  p)  specifies  which  children  *+  1 
communicate  back  to  parent  *  (there  may  be  none,  one  or  many).  Since  *+l  is  always  greater 
than  *,  C2  will  always  be  upper  triangular  with  l's  on  the  diagonal.  If  parent  *  receives  no 
communication  from  any  child,  the  corresponding  row  *  of  C2  will  have  only  a  single  1  at  its 

diagonal.  This  is  a  terminal  or  leaf  element  in  C2.  The  introduction  of  C2  *  c,T  allows 
unidirectional  communication  which  may  be  useful  in  many  applications. 

Since  the  system  is  composed  of  p  elements,  there  is  a  natural  partitioning  of  vectors  Xi, 
x2,  *3,  b,,  b2  and  b<  into  p  corresponding  subvectors.  Likewise  block  diagonal  matrices  a2,,  Hi3 
and  HUz  are  partitioned  into  p  submatrices  consistent  with  the  subvector  dimensions.  Block 
lower  triangular  matrix  C,i  has  the  same  block  sparsity  pattern  as  C,  and  its  submatrix 
dimensions  are  compatible  with  the  components  of  x,.  In  a  similar  manner,  block  upper 
triangular  matrix  Ca  has  the  same  block  sparsity  pattern  as  C2  and  its  submatrix  dimensions  are 

compatible  with  the  components  of  x2.  If  Cz  -  c,T,  the  generalized  coefficient  matrix  in  Eq.  2 
will  be  block- symmetric  and  this  matrix  is  represented  by  an  undirected  graph  or  simply  graph. 
Otherwise  it  is  represented  by  a  directed  graph  or  digraph  [4],  In  either  case,  all  of  the 
algorithms  developed  in  this  paper  apply.  A  given  subvector  of  a  vector,  or  submatrix  of  a 
block  diagonal  matrix  is  referenced  by  appending  an  additional  subscript  to  the  corresponding 
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symbol.  For  example  x1#  refers  to  the  »th  subvector  of  xi  and  A^*  refers  to  the  *th  block 
diagonal  submatrix  of  As,. 

With  this  convention,  Eq.  1  can  be  described  in  more  detail  as  follows.  A  typical 
submatrix  equation  in  the  first  set  of  equations  in  Eq.  1  is 

A^*.,  x1lM  -  Xj#.)  +  £  *2*  ■  bj*., 


where  submatrices  are  transformations  or  projections  (not  necessarily  orthogonal  or 

invertible)  which  take  x2»  into  Xj*.,  coordinates.  The  dimensions  of  vectors  x2*  and  x2*.,  may  be 
different  These  matrices  depend  on  the  individual  coordinates  selected  to  represent  the  system 
being  modeled.  The  -  E^*  block  submatrices  fall  into  the  off-diagonal  block  locations  of  Cm  in 

accordance  with  the  location  of  off-diagonal  -l's  in  C2.  The  summation  in  Eq.  7  results 
because  parent  *-l  may  receive  communication  from  none,  one  or  many  children,  hence  the 

notation  "*  on  *-l".  The  corresponding  block  row  *-l  of  -  Ca  in  Eq.  1  has  just  the  right 
number  of  E^'s  to  pick  up  and  transform  the  subvectors  of  children  influencing  *-l. 

In  a  similar  manner  a  typical  submatrix  equation  in  the  second  set  of  equations  in  Eq.  1 
is 

X,*.)  -  H13#  “  b1*  ZO\ 


where  submatrix  En*  is  a  transformation  or  projection  (again  not  necessarily  orthogonal  or 

invertible)  which  takes  x1#.,  into  x,.  coordinates.  The  -  E„*'s  fall  into  the  off-diagonal  block 
locations  of  C,i  in  accordance  with  the  location  of  off-diagonal  -  l's  in  Ci. 

Finally,  a  typical  submatrix  equation  in  the  last  set  of  equations  in  Eq.  1  is  simply 


(9) 


For  Eq.  1  to  have  a  solution,  Eqs.  2  to  6  indicate  that  it  is  necessary  and  sufficient  for 
the  generalized  matrix  Ffo  A2i  R,i  h,3  to  be  nonsingular  so  x3  can  be  evaluated.  This  implies 
that  it  is  necessary  for  matrix  to  at  least  have  full  row  rank,  for  matrix  Hi3  to  have  full 
column  rank  and  for  the  subspace  dimensions  3  and  4  to  be  the  same,  but  this  is  not  sufficient. 
Situations  in  which  the  coefficient  matrix  in  Eq.  1  or  2  is  singular  will  be  addressed  later.  Thus 
Eqs.  2  to  6  and  Eq.  1  are  equivalent  and  this  paper  is  concerned  with  the  symbolic  generation 
of  the  natural  factors  of  the  generalized  matrix  in  Eq.  2  using  a  recursive  algorithm  for  solving 
Eq.  1. 
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Using  the  topological  information  stored  in  matrices  Cu  and  Ca,  the  following  natural 
symbolic  factors 

Lb  Dm  llj,  »[H«  Ra  Aji  Rn  Hij]'1  ^  jq^ 

and 

U«4  Di  La*  H«Ra  Aai  Rii  H13 

will  be  obtained  in  a  later  section.  Matrices  La  and  U*  and  their  inverses  are  block  lower  and 
upper  triangular  with  l's  on  the  diagonal  and  have  respective  block  sparsity  patterns  identical  to 
Rn  and  Ra,  and  Dm  is  block  diagonal.  This  presumes  that  matrix  Dm  exists,  which  may  not  be 
true  in  many  cases,  such  as  when  subspaces  3  and  4  have  different  dimensions.  Such 
singularity  problems  will  be  addressed  later.  However,  in  the  following  development,  assume 
that  there  are  no  singularities  and  that  all  matrices  can  be  evaluated. 

The  basic  recursive  algorithm  for  solving  Eq.  1  was  developed  in  [1]  and  is  repeated 
here  with  modifications.  The  labels  in  the  following  algorithm,  e.g.  ROL1  stand  for 
"Recursive  Open  Loop  step  1",  ere.  Recall  that  special  ordering  and  orientation  of  the  joints 
and  elements  made  Cu  block  lower  triangular  and  Ca  block  upper  triangular  with  all  l's  on  the 
diagonal.  Thus,  solving  equations  of  the  form  Cu  xi  -  b,  is  best  done  by  evaluating  the  first 
subvector  of  xi  and  successively  evaluating  adjacent  subvectors  in  forward  order,  in  this  case 
from  a  to  p.  Likewise  Ca  x*  -  fe  is  solved  from  last  subvector  to  first  or  in  reverse  order  from  p 
to  a.  The  first  situation  amounts  to  optimal  traversal  of  the  C,  tree  from  root  toward  leaves  and 
the  second  from  leaves  toward  root  in  the  Ca  tree.  The  sequence  in  step  ROL4  of  the  following 
algorithm  is  executed  in  forward  order  and  steps  ROL2,  ROL3  and  ROL5  in  reverse  order. 
Because  of  the  special  ordering  of  element  names,  decrementing  or  incrementing  *  in  the 
algorithm  means  "move  to  the  adjacent  symbol  and  the  corresponding  adjacent  row  and  column 
in  Cn  or  Ca."  Since  adjacent  elements  may  not  have  adjacent  symbols,  as  noted  earlier, 
reference  to  M  means  "select  the  adjacent  or  parent  element  closer  to  the  root  of  the  c,  tree  and 
the  corresponding  parent  row  or  column  of  Cu,  not  necessarily  the  previous  element,  row  or 

column  in  the  sequence."  The  symbol  4—  in  steps  ROL2,  ROL3  and  ROL5  means  "assign  the 
quantity  in  the  right  expression  to  the  left  expression.”  This  is  equivalent  to  summing  projected 
quantities  from  the  one  or  more  communicating  children  onto  their  parent  and  it  does  not 
disturb  the  natural  sequence  of  the  recursive  algorithm.  If  the  system  should  have  more  than 
one  tree,  then  each  can  be  processed  independent  of  the  others  by  repeating  the  following 
algorithm  with  a  different  set  of  matrices.  As  an  aid  to  understanding  this  and  the  following 
algorithms,  a  comprehensive  example  is  developed  in  Appendix  B. 

RECURSIVE  OPEN  LOOP  ALGORITHM 
ROL1  Evaluate  the  components  of  Agi,  H,3l  H«a  b,,  bj,  b*,  Cu  and  Ca 

ROL2.0  Initialize  Aj  -  An 
ROL2. 1  For  *  =  p  to  a  repeat 

(  B«1v  «  Aj1# 
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®»*  “  *21*  ^13* 

Bs**  H13*J  ^ 

F31#  "Dm#  B41# 

F»*  •  Bjj*  Dm* 

*2t*-i  *~  K\*-\  *  '  &23*  F31*j  ^11*  ) 

Skip  the  last  equation  when  *  corresponds  to  any  row  of  C11  or  column  of  Ca  with 
only  an  l  since  the  corresponding  submatrix  E„*  or  Ea*  is  zero 

ROL3.0  Initialize  b|  -  bj 
ROL3.1  For  *  =  p  to  b  repeat 

(*3*  ■  j^b4»  +  H42*  bj*]  -  F31.  b1# 

*>2*-1  **2*-1  +  [b2*  ■  *21*  [  hi*  +  H13*  Xj*  j ) 

Skip  the  second  equation  when  *  corresponds  to  any  column  of  Ca  with  only  an  1 
ROL3.2  *3»  *  Cw.fb**  +  H43M  bj,]  -  Fsu  bi« 

ROL4  Xia  ■  bia  +  Hi3a  X3a 

R0L4.1  For  *  =  b  to  p  repeat 

(*3*  “  Xj*  -  F31*  j^En*  x1#.,J 

Xi*  «i^Eii*  X-i*.^  +  b,*  +  H13*x3#  ) 

The  vector  E„*  x1<M  -  0  when  *  corresponds  to  any  row  of  Ci,  with  only  an  1 

ROL5.0  Initialize  X2  -  -  +  An  x, 

ROL5.1  For  *  =  p  to  b  repeat 

(*2*-l  *2*-1  +  ^22*  *2*  ) 

Skip  this  equation  when  *  corresponds  to  any  column  of  Ca  with  only  an  1 
3 .  NATURAL  FACTORS  OF  THE  GENERALIZED  MATRIX 

Optimal  block  permutation  and  U-L  factorization  applied  to  the  coefficient  matrices  of 
tree- structured  systems  result  in  an  absolute  minimum  block  fill  pattern  in  the  U  and  L  matrices 
[4J.  This  is  achieved  by  selecting  forward  elimination  and  back  substitution  sequences  which 
precisely  follow  tree  topology  and  neveT  jump  across  (from  branch  to  branch)  unprocessed 
elements.  Elimination  and  back  substitution  each  require  p  sets  of  the  recursive  operations  in 
the  above  algorithm.  Instructions  for  completing  recursive  forward  sweep  step  *  in  ROL4. 1 

come  from  cell  row  *  of  Cn  and  can  be  represented  by  an  elementary  matrix  Cn*  which 

corresponds  to  matrix  Cn  with  all  off-diagonal  cell  entries  zeroed  except  in  cell  row  *  (leaving 
exactly  one  (or  none  in  case  of  a  root  or  non  communicating  parent)  off-diagonal  block  matrix 
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-  E„*  in  ceil  row  *)  [5].  Matrix  C,,*  is  a  factor  of  Cn  and  not  a  submatrix,  and  thus  has  the 

same  dimension  as  Cn.  It  follows  that  the  p  sequential  instructions  for  the  complete  recursive 
process  can  be  factored  into  the  product 


Cii  -  fj  Ci 


1* 


(12) 


Further  inspection  of  each  elementary  matrix  C„*  reveals  that  its  inverse  can  be  obtained  simply 
by  reversing  the  single  off-diagonal  block  matrix  sign.  Thus  from  Eq.  3  it  follows  that 


R,i-nC 


In  a  similar  manner 

a 

C*-n  Ca* 

*-p 

and 


Ra-n  C. 


22* 


(13) 


(14) 


(15) 


As  above,  elementary  matrix  C^*  corresponds  to  Ca  with  all  off-diagonal  entries  zeroed  except 
in  cell  column  *  (which  again  leaves  exactly  one  (or  none  in  case  of  a  root  or 

noncommunicating  child)  off-diagonal  block  matrix  -  Ej**  in  cell  column  *).  Again,  C^,  is  not 
a  submatrix  but  a  factor  of  Cz>. 


Matrix  C,,*  can  be  envisioned  as  selecting  and  coupling  subvectors  x,*  and  x,*.,  into  the 

sum  x1#  -  E„*  x,*.,  in  the  composite  arrays  of  subvectors.  Matrix  Cn  couples  the  entire  set  of 
subvector  sums  where  the  components  are  either  equal  to  or  added  to  other  subvectors  with  * 

subscripts  (see  Eqs.  8  and  1).  For  example,  suppose  x,.  -  E„*  x,*.,  -  b„  + ...  then  Cn  x,  -  bi  + ... 
represents  the  coupled  system  of  subvectors.  Reverse  order  of  products  in  Eq.  13  still  yields  a 
nonsingular,  lower  triangular  matrix  with  l's  on  the  diagonal,  but  the  degree  of  sparsity  in  R,,. 
relative  to  C,„  is  a  function  of  the  degree  of  parallelism,  whereas  Cn  sparsity  is  a  function  only 
of  the  number  of  trees  and  elements  in  the  system.  The  minimum  fill  pattern  in  Cn  compared  to 
Rn  is  what  makes  recursive  algorithms  so  attractive  for  solving  highly  sequential  problems  on 
serial  processors. 
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In  a  similar  manner  Ca  couples  a  set  of  equations  with  subvector  sums  of  the  form 


*2*-i  -  Z  Ea*  <2*  which  are  equal  to  or  added  to  other  subvectors  with  *-l  subscripts  (see 

*on 

Eqs.  1  and  7). 

The  following  matrices  are  assumed  to  operate  on  subvectors  whose  topology  is 
defined  by  Ci.  Let 

Bn  -  diagflfcia,  Bm>, BiipJ  (i  •  1, 3)  (16) 

and  consider  a  set  of  subvectors  of  the  form  -  B,,.  E11#  x1tM  which  are  equal  to  or  added  to  other 
subvectors  with  *  subscripts.  The  product  Bsi[Cu-i]xi  represents  the  coupled  system  of 
subvectors.  Likewise  the  product  [l  +  Bn  [Cn  -  ijj  xi  represents  the  coupled  system  of  subvectors 
of  the  form  x„  -  B11#  E,,*  x,*.,  which  are  equal  to  or  added  to  other  subvectors  with  *  subscripts. 

In  the  latter  case  with  i  =  1  the  submatrices  in  Eq.  16  must  all  be  square,  but  not  necessarily 
nonsingular. 

In  a  similar  manner  the  following  matrices 

Bj|  ■  diag  [  B^,  Ba>, . Bqgj  (j  -  1. 2, 4)  (17) 

operate  on  subvectors  whose  topology  is  defined  by  C2.  The  product  [Ca  - 1]  Bz,  b,.  j  - 1 , 4 

♦-1 

couples  subvectors  -Z  E22*Bj^bt.,j-il4  and  [I  +  [C22  - 1]  822]  couples  subvectors 

♦  an 


♦•1 

*>2*.,  -  Z  E22*  Bjj,  kQth  which  are  equal  to  or  added  to  other  subvectors  with  *-l  subscripts. 

♦on 

As  above,  with  j  =  2,  the  submatrices  in  Eq.  17  must  all  be  square,  but  not  necessarily 
nonsingular. 

In  summary,  the  matrix  identities  are: 

For  x,.  -  E„.  x,*.,  ->  (*),  use  C11  x,  (18) 


For  Xa*.,  -  z  Eaa.  x2,  -4  (*  -1),  use  Cj2  x2  (19) 

♦on 

For  -  83,.  E11#  x,..,  (*),  use  B31  [c„  - 1]  x,  (20) 


498 


For  X,*  -  B,,*  Et1,  X,..,  -» (*),  use  [I  +  Bit  [Cl,  -  lU  Xi 


(21) 


*-1 

For  -  z  Eo*  Bj),  bj,,  js  1. 4  — *  (*  -1),  use  [C22  - 1]  Bs)b|,  j  -  1, 4  (22) 

*on 


*-1 

For  b^t  -  2  Effl,  B^  bj,  ->  (*  -1),  use  [U[Ca-l]  Bajbj  (23) 

•an 


These  special  shifting  matrix  structures  insure  that  all  subvectors  are  placed  into  the 
correct  locations  in  the  composite  vector  arrays.  Subvector  products  such  as  b,,  x,,  — ►  (*)  or 

Bq,.,  x,*.,  — »  (*  -1)  are  not  shifted  and  the  corresponding  matrix  product  Bq  X)  applies  in  both 

cases.  Matrix  Bbi  [Cm  •  l]  is  block  lower  triangular  with  zero  matrices  on  the  block  diagonal,  is 
generally  rectangular  and  always  singular.  Matrix  l  +  Bm  [Cm  - 1]  is  always  nonsingular  and 
lower  triangular  with  l's  on  the  diagonal.  In  a  similar  manner  matrix  [C&  - 1]  Bq.  j  - 1 , 4  is  block 
upper  triangular  with  zero  matrices  on  the  block  diagonal,  is  generally  rectangular  and  always 
singular.  Matrix  I  +  [Oa  - 1]  Ba  is  always  noosingular  and  upper  triangular  with  l's  on  the 
diagonal.  Again,  these  simplifications  are  due  to  the  special  preordain g  and  orientation  of  the 
elements. 

With  these  tools,  steps  ROL3.1  and  ROL4.1  of  the  previously  developed  recursive 
algorithm  will  now  be  used  to  obtain  the  natural  factors.  First  write  the  equation  in  step 
ROL3.1  as 


*■!  *-1 

**2*-i  -  X  ^22*  **22*  **2*  *  **2*-1  *  X  ^22*  ^24*  **«•  +  *21*  **1*J 

♦  on  *on 


(24) 


where 


“  I  •  B23*  Djt,  H^a,  ■  I  -  F j*,  H42, 

is  a  projection  matrix  and 

*21*  "  *21*  *  ^24*  H42*  *21*  "  Pj2*  *21* 


(25) 


(26) 


is  a  projected  coefficient  matrix  [1].  A  second  set  of  equations  similar  to  Eqs.  25  and  26, 
which  will  be  useful  later  are 


and 


P,i*  ■  I  -  Hia,  0)t«  B«„  •  I  -  Hi3*  FJt, 


*21*  ™  *21*  '  *21*  **'3*  ^31*  ■  *21*  ^*11* 


(27) 


(28) 
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Now  use  the  above  identities  to  obtain 


[1  +  [Ca *  1]  Pa]  bJ-ba+CC*  - +  b,] 

(29) 

or 

b|-[l  +  [Ca  •  l]  Pa]-1  [bj+[Ca  *  I][Fm  ta* ♦  A§,  bi]] 

(30) 

where  the  submatrices  of 

Ba  -  diag  f Bz*.  Bab, ....  BapJ 

(31) 

F»  -  diag  f  Fm*.  F*b, ....  Fj«pj 

(32) 

P»i  -  diag  f F3i«.  F3«>, ....  FsipJ 

(33) 

Dm  -  diag  f  Dm*  Dm*.  ....  DompJ 

(34) 

Pii  «diagfPna,  Pub, ....  PnpJ 

(35) 

Pa  -  diag  f  Pam.  Pa*  ....  PapJ 

(36) 

and 

Alt  “diag ("a^  Aj,b. ....  A£p] 

(37) 

used  in  the  above  equations  are  obtained  from  the  recursive  equations  in  step  ROL2.1  of  the 
above  algorithm  and  Eqs.  25  to  28. 

From  step  ROL4.1  with 

*3*  “  Dm*  [b«,  +  Ha*  b^j  -  F31#  £b,*  +■  E,,,  x1#.,j 

(38) 

it  follows  that 

X3  “  Dm  [b*  +  H«  l4]  ■  [bi  *  [Cn  - 1]  Xi] 

(39) 

This  equation  cannot  be  evaluated  directly  because  x,  also  depends  on  x3. 
substitute  Eq.  5  to  eliminate  xi  and  rearrange  to  obtain 

However,  one  can 

[1  +  F31  [Ru  - 1]  H13]  X3  ■  Dw[b«  +  H«  b|]  -  F31  Rn  bi 

(40) 

In  light  of  the  earlier  discussions,  the  matrix  in  front  of  x3  in  Eq.  40  is  nonsingular  and 
lower  triangular  with  l's  on  the  diagonal  and  Eqs.  2,  30  and  40  can  now  be  used  to  find  a 
symbolic  representation  of  the  natural  factors  of  the  generalized  coefficient  matrix  in  Eq.  2. 
First  invert  the  left  matrix  in  Eq.  40,  substitute  Eq.  30  and  set  the  arbitrary  quantities  b,  and  bj 
to  zero  giving 

Xa  ■  [!  ♦  Fji  [R„  - 1]  Hu]'1  Dm[|  +  H«[l  +[Cn  •  Ij  Pa]-'  [Ca  - 1]  F»]  b>  (41) 
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Do  likewise  for  Eq.  2  giving 


Xa-[H«R»  Aoi  Hu  HiaJ1  tu 


(42) 


Since  b«  is  also  arbitrary  it  follows  by  equating  coefficients  in  Eqs.  41  and  42  that 

[H«  Ra  A*  Rn  H*}1  -[I  +  F>,  [R„  - 1]  He]'1  D*[l  +  H«[l  ♦  [Ca  - 1]  *aY  [Co  - 1]  *»]  (43) 

Equation  43  is  the  desired  factored  representation  for  Eq.  10  since  the  matrix  to  the  left  of  Dm  is 
lower  triangular  with  l's  in  the  diagonal  and  the  matrix  to  the  right  is  upper  triangular  and  also 
has  l's  on  the  diagonal.  Thus  it  follows  that 


1-33  ■  I  +  F31  [Rll  -  l]  H13 


(44) 


and 


Um  *  [l  +  He  [l  +  [Co  - 1]  PaJ1  [Ca  - 1]  ft*]*1  (45) 

Equation  45  is  inconvenient  to  evaluate,  but  the  results  from  [2]  suggest  the  following 
alternative  expressions 


U  -[l  +  Fj,  [C„  -  l][l  +  P„  [C„  -  Ifl-'  Hi*]-1 


(46) 


and 


Uu  ■  I  +  He[Ro  *  l]  Fa* 


(47) 


Equations  46  and  47  are  verified  in  Appendix  A.  The  matrices  in  Eqs.  44  and  47  will  be  used 
to  provide  the  optimal  solution  algorithm  for  Eq.  2. 

While  Eqs.  44  and  47  represent  the  most  effective  implementation  of  the  natural  factors, 
an  interesting  different  view  of  these  factors  can  be  obtained  by  first  noting  that 


F31  H13  »  Dm  B«i  Hia 
«  Dm  Qj« 

-I 


(48) 


and 


H*j  F»  «  H«  Bo  Dm 
■  Dm  Dm 
-I 

Then  these  equations  can  be  used  to  express  the  factors  as 

L33  -  F31  R11  Hia 

-  Om[h«  R11  H13] 

■[Hu  A£,  Hia]  [h«  A^i  R11  Hia] 


(49) 


(50) 


and 


501 


RzsFm 

-[h« fe  Aj  H«]  Dm 

-[H«  Raa  Aj,  H,3][H«  Aj  Ho]'1  (51) 

where 

^-[HttAjHoJ1  (52) 

Finally,  substitute  these  equations  into  Eq.  11  to  give 

U«4  Di  Lag  -  [H«  R»  Aj,  Ho]  [H«  Aj  Ho]  [Hu  Ajt  Rn  Ho] 

-[H«Ra][Aj  H,a[H«Aj  H»]'  H«  <J[Rm  Ho] 

■[Hu  Rh]  Aai  [Rn  Ho]  (53) 

4 .  THE  GENERALIZED  SOLUTION  ALGORITHM 

The  first  step  in  determining  x3  in  Eq.  2,  given  all  the  necessary  matrices  and  vectors  is 
to  find  y4  from 

U«4  y4  -  h«  +  He  Ra[bj  -  Aai  Rn  bi]  (54) 

Since  u«  is  upper  triangular,  evaluate  the  subvectors  of  y4  from  bottom  to  top.  Next  evaluate 
the  subvectors  of  x3  from  top  to  bottom  by  solving 

Ln  xa  -  y4  (55) 

Matrix  La  always  has  the  same  block  fill  pattern  as  Rn  and  it  was  noted  earlier  that  the  degree  of 
fill  is  strictly  a  function  of  system  topology.  Thus  the  overhead  in  Eqs.  54  and  55  can  vary 
from  0(p2)  for  serial  trees  where  Rn  and  Laa  have  maximum  fill  below  the  diagonal,  to  0(p)  for 
completely  parallel  systems  in  which  case  Rn  and  Laa  have  minimum  fill  below  the  diagonal. 

The  revised  computational  algorithm  is  now  presented  where  the  symbol  POL  stands 
for  "Earallcl  Open  Loop". 

PARALLEL  OPEN  LOOP  ALGORITHM 

POL1  Evaluate  the  components  of  A?i,  Hia,  Hu,  bi.  ba,  tu,  Cn  and  Ca 

POL2.0  Initialize  a£  -  A* 

POL2. 1  For  *  =  p  to  a  repeat 

(  B41„  -  A,^ 

Bj3*  ■  4j1#  Hn* 

Om* -[B41*  H13*]  ^ 

F31*  ■0j4*  B«i* 

^24*  "  Bja*  ®a4* 
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Am*-i  +  [^21*  *  ®as*  Fji*j  ^ii*) 

Skip  the  last  equation  when  *  corresponds  to  any  row  of  C,i  or  column  of  Ca  with 
only  an  l  since  the  corresponding  submatrix  E„*  or  is  zero 

POO  Evaluate 

L33-l  +  Fj,[R,1-llH,3 

Uu  » I  +  H«  [Ra  - 1  j  Fa* 

POO  Solve 

UU  y*  ■  tu  +  H43  Ra[b2  -  Aai  Rn  bi]  f0r  y4 

1-33  *3  -  Dj4  y*  for  X3 

POL5  If  desired,  evaluate 

Xl  ■  R11  [bi  +  H13  X3] 

X2  ■  -  Raftig  -  Am  Xi] 

5 .  SYMBOLIC  FACTORS  OF  PARTITIONED  AND  SINGULAR 
MATRICES 


Inspection  of  the  matrix  in  Eq.  1  reveals  that  the  submatrix 

'  Am  -Carr  0  R„ 

C11  0  -  Ra  Ra  Agi  R11 . 


(56) 


is  nonsingular,  independent  of  the  rank  of  A2t  (see  more  discussion  in  Appendix  B).  Thus  x, 
and  x2  can  be  determined,  regardless  of  the  rank  deficiency  of  the  coefficient  matrix  in  Eq.  1  or 
2.  Rank  deficiency  in  these  matrices  is  reflected  in  the  rank  deficiency  of  the  individual 
matrices 


**  H42*  aJi*  H 


13* 


(57) 


in  the  recursive  steps  ROL2.1  or  POL2.1.  If  a  given  is  singular,  the  corresponding  Eq.  38 
must  be  returned  to  the  form 


C34*  *3*  “  R«*  Aji*  Rl3*  *3* 

m  b*»  +  Hw,  b24  -  Ej,,  £b„  +•  E11#  x,*.,] 


(58) 


Equation  58  indicates  that  components  of  x3*  equal  in  number  to  the  column  rank  deficiency  of 
cannot  be  computed  and  must  be  supplied  and/or  a  number  of  equations  equal  the  row  rank 

deficiency  of  D^#  are  dependent  and  must  be  checked  for  consistency  and/or  eliminated.  In  the 

first  situation,  the  undetermined  components  of  x3#  are  assumed  to  be  specified  or  the  problem 
will  be  ill  posed.  In  addition,  one  may  wish  to  specify  or  drive  one  or  more  of  the  components 
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of  x3*  even  though  D^.  may  have  full  column  rank.  In  either  situation,  these  known  quantities 
can  be  moved  to  the  right  hand  side  of  the  equations  and  eliminated  from  the  recursive  solution 
process.  With  this  knowledge  x3«,  b4„,  H13*  and  H**  can  be  partitioned  according  to  free  and 
specified  variables,  and  independent  and  redundant  equations  as 


3* 

t 

3* 


and 


He*  ” 


H 


c 

i 


In  a  similar  manner,  Gq.  1  can  be  partitioned  as 


"  A*, 

-  Ca 

0 

0 

X, 

—  — 

bj 

Ci, 

0 

-H'n 

-H?3 

Xa 

b, 

0 

0 

0 

x3 

b« 

0 

< 

0 

0 

.  b"+‘»: . 

(59) 

(60) 
(61) 


(62) 


(63) 


The  dimensions  of  x3  and  b*  will  always  be  the  same,  matrix  h(3  will  have  full  column  rank  and 

matrix  will  have  full  row  rank.  The  slack  variables  in  vector  bj  are  introduced  to  insure 
equality  in  the  last  set  of  dependent  redundant  equations. 

The  first  three  sets  of  equations  may  be  rewritten  as 


Aai 

-  Ca 

0 

X, 

b? 

c„ 

0 

-h'3 

Xj 

m 

b,  +  H?3  xj 

0 

H42 

0 

b« 

and  the  last  set  of  equations  as 

bi  -  -  bj  +  hJj  xj  (65) 
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By  construction,  the  coefficient  matrix  in  Eq.  64  is  now  nonsingular  and  the  general  equation 
equivalent  to  Eq.  2  with  nonsingular  coefficient  matrix  is 

Hjj  Ra  Aai  Ri,  ■  bi  +  h!b  R»[&2  -  Abi  Rii  [b,  +  H,'3  xjJ  (66) 

The  "Recursive  Open  Loop  algorithm  for  Singular"  (ROLS)  matrices  may  now  be 
stated.  For  simplicity,  the  following  algorithm  assumes  that  all  singularities  are  known  in 
advance  and  that  the  partitioning  into  free  and  specified  variables  is  known.  A  more  general 

algorithm  would  detect  rank  deficiencies  in  submatrices  D^#  at  each  step  and  take  appropriate 
action  as  necessary. 

RECURSIVE  OPEN  LOOP  ALGORITHM  FOR  SINGULAR  MATRICES 
ROLS1  Evaluate  the  components  of  A*.  Hia.  Htij,  bi,  fe,  b«,  Cn,  C®  and  x| 


ROLS2.0  Initialize  Aj  -  Aa 
ROLS2. 1  For  *  =  p  to  a  repeat 


Ik.  h;3. 

o'L 

■L 

4. 

+  ^22*  ^1*  *  ®23*  ^31*] 

Skip  the  last  equation  when  *  corresponds  to  any  row  of  c„  or  column  of  Cz>  with 

only  an  1  since  the  corresponding  submatrix  E,,*  or  E^.  is  zero 

ROLS3.0  Initialize  b?  -  bj 
ROLS3. 1  For  *  =  p  to  b  repeat 

(*£  -  bj#]  -  f;,#  [b,.  ♦  K3.  xl] 

®*2*-1  +  ^22*  ^2*  *  ^1*  +  ^13*  *3*  +  ^13*  X3*jj  ) 

Skip  the  second  equation  when  *  corresponds  to  any  column  of  Ca  with  only  an  i 
ROLS3.2  x3,  ■  D?j»*[b«*  +  Kw,  bj  -  F311£bi»  +  H'u,  x3lJ 


ROLS4  xi.  ■  bi,  +  H)3,  Xjj 
ROLS4. 1  For  *  =  b  to  p  repeat 

(X3*  “  *3*  *  ^31*  [^11* 

X,*  *  X,*.,  +  b1#  +  Hm  x3J 

The  vector  E„*  x,*.,  -  o  when  *  corresponds  to  any  row  of  Cn  with  only  an  i 
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ROLS5.0  Initialize  xa -- b2  +  As,  x, 

R0LS5.1  For  *  =  p  to  b  repeat 

(*s*-i  x2+-i  +  Ea*  Xj* 

Skip  this  equation  when  *  corresponds  to  any  column  of  Ca  with  only  an  I 

+  ^42*  *2*) 

Skip  this  equation  when  is  null 


Following  the  same  procedures  as  above,  the  revised  "Parallel  Qpen  Loop  algorithm  for 
Singular"  (POLS)  matrices  becomes: 


PARALLEL  OPEN  LOOP  ALGORITHM  FOR  SINGULAR  MATRICES 

POLS  1  Evaluate  the  components  of  A21.H13.H4a,  bi,  ba.  b4,  C„,  Ca  and  x3 

POLS2.0  Initialize  A|,  -  An 
POLS2.1  For  *  =  p  to  a  repeat 

C  -  o'L  B41. 

^21*.,  ^21<M  +  ^22*  ^21*  '  Ba.  P3i*j  E,,*  ) 

Skip  the  last  equation  when  *  corresponds  to  any  row  of  c„  or  column  of  Ca  with 
only  an  1  since  the  corresponding  submatrix  E„*  or  Ea*  is  zero 


POLS  3  Evaluate 

Ly3-i**>i[Hii-l]H1i3 

Um-I  +  hURb*I]I*m 
POLS4  Solve 

uL  yi  -  bi  ♦  ^42  Raa  [ba  ■  Aa,  R„  [b,  +  H%  x $]]  for  yi 
Solve 

Lj3  x3  -  y4  for  x3 
POLS  5  If  desired,  evaluate 

Xi  »  R11  [bi  +•  H13  Xa] 

Xa  -  -  Raa[ba  •  Aa,  x,] 
bj  -  -  b?  +  H*2  Xa 
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These  algorithms  show  that  the  recursive  process  makes  it  a  simple  matter  to  efficiently 
eliminate  any  number  of  embedded  singularities  and  redundant  equations,  and  to  find  symbolic 
factors  for  the  largest  nonsingular  submatrix. 

6 .  ITERATIVE  REFINEMENT 

Consider  the  linear  system  of  equations  Ax-b  and  suppose  the  product  of  lower 
triangular  matrix  L  and  upper  triangular  matrix  U  approximates  A.  Then  solve  the  equivalent 

system  of  equations  Lilx(1)-b  for  x(1),  the  first  approximation  to  x.  The  residual  vector  for 
this  first  iteration  is 


r^b-Ax™ 

(67) 

Inverting  A  in  Eq.  67  yields  A  1  r  ^  -  x  -  x  ^  and  implies  that  a  correction  a  x 
approximated  by  solving 

(1)  (i)  , 

to  x  may  be 

(68) 

giving  a  second  approximation  for  x 

(2)  (1)  0) 
x'  -x'  '+&X 

(69) 

In  general  the  iterative  process  repeats  to  the  (k)th  step  as 

rW.b-A.W 

(70) 

UU«|k,.r<k> 

(71) 

and 

,<k+1>  *(k,4.A*(k) 

X  «*  + AX 

(72) 

One  can  show  by  induction  at  the  k^1  iteration  that 

r^-tl-AOLifl'fb 

(73) 

which  implies  that  the  spectral  radius  of  1  -  A  [Lii]"1  should  be  less  than  1  [4].  Clearly  if  LU  -  A 

then  r^-o.  As  the  product  Lii  deviates  from  A,  the  rate  of  convergence  decreases  and  the 
number  of  iterations  to  an  acceptable  solution  increases.  An  excessive  number  of  iterations 

indicates  the  need  to  update  L  and  U. 

Iterative  refinement  is  useful  in  the  above  algorithms  especially  for  slowly  varying 
systems  because  the  costly  steps  in  ROL2,  POL2  and  POL3  can  be  avoided  most  of  the  time. 
If  the  iterations  converge  quickly,  this  can  yield  substantial  savings  in  computer  time. 
Furthermore  iterative  refinement  allows  the  POL  algorithm  to  more  effectively  exploit  vector 
and  parallel  processors  since  less  time  is  spent  in  the  serial  operations  necessary  to  evaluate  the 
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matrices  in  steps  P0L2  and  P0L3.  The  problem  is  to  find  equations  to  inexpensively  evaluate 
the  residuals  for  the  above  algorithms  and  to  incorporate  them.  Residual  calculations  are 
always  based  on  updated  quantities,  not  approximations,  since  the  residuals  must  go  to  zero  as 
the  iteration  converges  to  the  correct  solution.  The  residuals  for  the  first  algorithm  are  easily 

obtained  with  the  help  of  Eq.  1.  Let  be  the  kdi  approximation  to  the  solution  and  compute 

xi,  x2  and  rf’  from 


C,1  Xt  -  bi  +  Hi3  *2? 

(74) 

Cn  xa  ■  -  [b2  -  Aai  Xi] 

(75) 

and 

ri"  -  b«  -  H42  Xa 

(76) 

To  see  that  this  is  the  correct  residual,  these  equations  may  be  combined  with  the  help  of  Eqs.  5 
and  6  to  obtain 

r**  ■  b*  +  H«  Raa[ba  -  Aai  Ru  bi]  -  H»a  R22  Aai  Ru  Hu  Xj1 

(77) 

Which  is  simply  the  residual  of  Eq.  2.  Thus  the  residual  in  Eq.  76  or  77  is  appropriate  for  both 
of  the  algorithms.  According  to  Eq.  71,  Eq.  77  implies  that 

An  Rn  H13  Ax^*  ■  rf1  (7g) 

which  from  Eqs.  1  and  2  leads  to 


'  Aai 

-  Ca2 

0 

AXi 

'  0 

Cn 

0 

0 

H« 

-  Hia 

0 

AXa 

_AX3°  . 

0 

rw 

/« 

and 


*3 


w(k)  .„(*> 
>  Xj  +  AXa 


(80) 


The  quantities  axi  and  ax2  from  Eq.  79  cannot  be  used  to  update  xi  and  x2  because  there  are  no 
residuals  associated  with  them  and  one  cannot  be  sure  that  they  will  satisfy  Eqs.  74  and  75. 
Therefore,  they  must  be  computed  directly  from  Eqs.  74  and  75  for  evaluating  the  residual  in 
Eq.  76.  Thus  Eqs.  74  to  76  and  78  to  80  provide  the  information  necessary  to  add  iterative 
refinement  to  the  above  two  algorithms.  The  modified  "Recursive  Qpen  Loop  algorithm  with 
Iterative  refinement"  (ROLI)  follows  as: 

RECURSIVE  OPEN  LOOP  ALGORITHM  WITH  ITERATIVE  REFINEMENT 

ROLU  Evaluate  the  components  of  a2i,  Hu,  H«,  bi,  bj,  b*,  Cu  and  Ca 

ROLI2.0  Bypass  steps  ROLI2. 1  and  ROLI2.2  unless  convergence  rate  is  slow 
ROLI2. 1  Initialize  Aj  -  A* 
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ROLI2.2  For  *  =  p  to  a  repeat 


(  B41*  -  H42*  Aj 


1* 


H13* 


F31*  "Bw*  B41* 

Fj4*  "BsmDj*, 

+  E22*  ^1.  ‘  Ba*  Fj,*]  ^11*) 

Skip  the  last  equation  when  *  corresponds  to  any  row  of  C„  or  column  of  C*  with 
only  an  1  since  the  corresponding  submatrix  E„*  or  E®*  is  zero 


ROLI3.0  Initialize  bj  -  bj 
ROLI3.1  For  *  =  p  to  b  repeat 


(*3*  ■  Pi**[b4*  +  H«*  b^*]  *  ^3t*  b,* 

C»  «"  BJm  +  ***  [b2*  •  <*[  b1.  +  H13*  vj  ) 


Skip  the  second  equation  when  *  corresponds  to  any  column  of  C&  with  only  an  1 
ROLI3.2  x$*  -  Dsufb^  +  K*.  b^j  -  F31.  b,4 


ROLI4  X,«  ■  bia  +  H,3«  XS 
ROLI4. 1  For  *  =  b  to  p  repeat 

(*3*  ”  *3*  ‘  F31*  £&,,*  X,*.,j 

Xi*  o^E,,*  x,*.,l  +  b,*  +  H,a*x^*  ) 

The  vector  E„*  x,*.,  -  0  when  *  corresponds  to  any  row  of  C„  with  only  an  1 
LOOP  For  k  =  0,  1 , ...  do  to  LOOP  End 


ROLI5.0  Initialize  xa  -  -  ba  +  As,  xi 
ROLI5. 1  For  *  =  p  to  b  repeat 

(X2*.,  <—  Xj*.,  +  E22*  X2*  ) 

Skip  this  equation  when  *  corresponds  to  any  column  of  Ca  with  only  an  1 


ROLI6  Evaluate 

r* '  -  b4  •  H«  x2  and  exit  if  small 
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ROU7.0  Initialise  rb£  -0 
ROLI7.1  For  *  =  p  to  b  repeat 

(AXj#  ■  Dj**  |r2  +  H42* 

^2*-!  ^2**1  +  ^22*  ^r*>2*  •  Bjj*  AXj*j  ) 

Skip  the  second  equation  when  *  corresponds  to  any  column  of  Ca  with  only  an  l 
ROLI7.2  (ax&- +H4a.rt&] 

x3j‘;,,-x£+ax2) 

ROLI8  (AX,,  m  Hi9.  AX'}, 


Xi.  “  bi,  +  H13.  x£ 


ROLI8.1  For  *  =  b  to  p  repeat 

(ax32  -  AxJ,  -  F31* [E„*  AX,*.,] 
AXi#  <■  j^Em,  AXi*-,j  +  H13* AX 3,  ) 


The  vector  E,„  Ax,..,  .  0  when  *  corresponds  to  any  row  of  C„  with  only  an  l 


(lut)  <k)  M 

*9*  “  *3*  +  ^X3* 


Xi* +  b,,  +  H13#x2  ) 

The  vector  E„*  x,*.,  -  0  when  *  corresponds  to  any  row  of  C„  with  only  an  l 


LOOP  End 

The  modified  "garallel  Open  Loop  algorithm  with  Iterative  refinement"  (POLI)  follows 
as: 

PARALLEL  OPEN  LOOP  ALGORITHM  WITH  ITERATIVE  REFINEMENT 
POLI1  Evaluate  the  components  of  As,,  h,3)  h.2,  b,p  bj,  b4,  c„  and  Ca 

POLI2.0  Bypass  steps  POLI2. 1,  POLI2.2  and  POLI3  unless  convergence  rate  is  slow 
POLI2. 1  Initialize  Aj,  -  A& 

POLI2.2  For  *  =  p  to  a  repeat 

(  B41.  *  H42*  Aj1(k 

®23»  ”  Aj1)k  H,3# 

094*  *[841*  H13,j  ^ 

P31*  "034*  B41# 
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F  24*  ■  Bj9*D34* 

*21*.1  +  ^22*^1*  ‘  823*F3l*j  Ell*  ) 

Skip  the  last  equation  when  *  corresponds  to  any  row  of  Cn  or  column  of  Ca  with 
only  an  I  since  the  corresponding  submatrix  E,,„  or  E^,  is  zero 

POLI3  Evaluate 

la^l  +  Fjt [Rn  - 1] H13 
- 1  +  H*a[R»  *  IJ  Fa* 

POLI4  Solve 

U«4  y*  « b*  +  H«  ffafbj  -  A«i  R11  bi]  for  y4 
L33  x?  -  Dm  y*  for  xf 

LOOP  For  k  =  0,  1, ...  do  to  LOOP  End 

POLLS  Evaluate 

Xi  ■  Rn  [bi  +  H13  x^*] 

Xj  ■  -  R3z[b2  -  Xi] 

POLI6  Evaluate 

rf*  -  b*  -  H«  x*  and  exit  if  snwi 

POLT7  Solve 

Um  Ay*  .  rf  for  Ay* 

L33  ax?’  -  Ds*  Ay*  for  ax?* 

POLI8  Evaluate 

X3^,,.X3W  +  AX3W 


LOOP  End 

7.  CONCLUSIONS 

Solving  the  coupled  equations  of  large,  multiply  connected  systems  involves  many 
numerical  computations  which  must  be  carried  out  efficiendy  when  the  equations  are  solved 
many  times.  Until  recendy,  most  general  purpose  programs  have  assembled  the  necessary 
coefficient  matrices  and  relied  on  well  developed  external  programs  to  numerically  manipulate 
and  solve  the  resulting  linear  equation  systems.  The  need  for  fast  or  possibly  real  time 
solutions  has  prompted  development  of  recursive  strategies  to  symbolically  uncouple  the 
equations.  These  recursive  algorithms  are  ideally  suited  for  long  sequential  systems  but  not  for 
parallel  structures.  Furthermore  their  highly  recursive  nature  precludes  effective  use  of  parallel 
or  vector  processors.  To  address  the  above  problems,  this  paper  first  showed  how  these  basic 
methodologies  could  be  obtained  from  an  optimal  symbolic  block  matrix  factorization,  and  it 
presented  a  recursive  algorithm.  From  this  algorithm,  symbolic  natural  factors  of  an  equivalent 
generalized  coefficient  matrix  were  obtained.  It  was  suggested  that  iterative  refinement  would 
allow  some  of  the  more  computationally  intensive  recursive  operations  to  be  bypassed  or 
transferred  to  other  computers  for  parallel  processing.  In  this  development,  some  of  the 
recursive  steps  were  eliminated  by  using  others  to  generate  a  natural  factorization  of  the 
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generalized  matrix.  The  resulting  algorithm  has  computational  overhead  which  can  vary  from 
O(p)  for  highly  parallel  to  Ofp2)  for  serial  systems.  Exploiting  iterative  refinement  and  taking 
advantage  of  vectorization  and  parallel  processing  can  effectively  reduce  many  Ofp2)  problems 
to  O(p). 
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APPENDIX  A 


Prove  Eqs.  46  and  47.  Since  the  matrices  in  Eqs.  44  to  47  are  nonsingular,  it  is 
sufficient  to  show  that 


La  Li-[|  +  Fjt[Rn  - 1]  H13]  [l  +  F3i[C„  -  l][l  +  Pi,[C„  -  ID*1  H13]  -  I 


(Al) 


and 


U44  U**«[l  +  K*a[l  +[Ca  - 1]  P22]’1  [C22  - 1]  F24]  [I  +  H4a[Ra  ■  1]  F24]  -  1 

(A2) 

The  following  identities 

Rii  C11  m  Cn  Rii  -  1 

(3) 

Ra  C22  m  C22  R22  ■  1 

(4) 

Hi3  Fji»I  -  Pn 

(25) 

and 

F24  H43  *  1  ■  P22 

(27) 

will  be  useful  when  proving  Eqs.  Al  and  A2.  Since  the  expansion  and  simplification  of 
Eqs.  Al  and  A2  requires  some  tricks  and  juggling  of  terms,  several  intermediate  steps  have 
been  shown  to  assist  the  reader  in  following  the  proofs.  Keep  in  mind,  as  throughout  this 
paper,  that  the  identity  matrices  appearing  in  these  equations  will  have  different  dimensions 
according  to  the  matrices  they  appear  with.  Thus 

L»  Li-[l  +  F31  [Rii  - 1]  H13]  [l  +  F3,[C„  -  l][l  +  P„[C„  -  iJJ-’  H,3] 

■  I  +  F3i[Rh  - 1]  Hi3  +  Fsifl  +[Rn  *  l][l  -  P„]][C„  -  l][l  +  Pn [Cn  •  l]]'1  Hi3 

-  I  +  F3i[Rii  - 1]  Hi3  +  F^R,,  -  Rn  Pn  +  Pn][Cn  -  l][l  ■+•  Pn  [Cn  - 1]]'1  Hi3 

-I  +  F3i [Rii  - 1]  H,3  -  F3i[R„  -  l][l  +  Pti[C„  -  l]][l  +  P„[C„  -  ifl’’  H13 

.  ■  I  +  F3i[R„  - 1]  Hi3  -  Fji  [flu  - 1]  Hi3 

- 1  (A3) 

and 

lC  b«4 »[l  +  +[Ca  - 1]  Pa]’1  [C22  - 1]  F24]  [l  +  H42 [R22  •  l]  F24] 

■  I  +  H42 [R22  •  l]  F34  +  H«[[l  +[Ca  - 1]  P22J*1  [C22  - 1]  F24]  [l  H43[R22  - 1]  F24] 

■  I  +•  H4a(R22  *  l]  Fj4  +  H42 [l  +[022  - 1]  P22]"1  [C22  - 1] [P22  •  P22  Rz2  +  Ra]  F24 

-  I  +  H4a(Ra  - «]  -  H4a[l  +(Ca  - 1]  Pa]'1  [I  +  [C»  - 1]  P22]  [R22  - I]  F2* 

» l  +  H4a[Ra  *  l]  F24  -  H42 [Rz2  *  lj  F24 

-I  (A4) 

which  completes  the  proof. 
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APPENDIX  B 


Consider  the  requirements  for  the  coefficient  matrix  of  Eq.  1  repeated  below  to  be  nonsingular. 

'  Aai  -Ca  0  If Xi  1  fba' 

Cit  0  *Hi3  xj  ■  bi 

_  0  H*8  0  _Xj_  „b«  (1) 


’  An  -Ca  0 
“  Cn  0  -  His 

0  Ha  0 


where 


-  r  Azi 

“  L  Cn 

0 

L  H« 


Z-[  0  H«  ] 


X-'  -  X 1  Y  [z  X-’  Y]-1  Z  X-’  X-1  Y  [Z  X-’  Y]° 

•[zx1  yI-'zx-1  [zx-’y!1 


0  Rn 

■  R22  Ra  A  21 


In 

21  Rll. 


Substituting  Eqs.  Bl  to  B4  and  B6  into  Eq.  B5  gives 


An  -Ca 
Cn  0 
0  H« 


R11  Hu  Aa  Ha  Ra 


R11  -  R11  His  A^  Ha  Ra  A21  R« 


Ra  ♦  Ra  A*  Rn  His  A«j  Ha  R®  Ra  A21  R11  -  Ra  Aji  Rn  His  Aa  Ha  Ra  A2i  Ri 


A«  Ha  Ra 


-  A*  Ha  Ra  A2i  Rn 


Rn  His  A43 
Ra  Azi  Rn  His  A® 

A«j 


where  Aa  is  the  generalized  coefficient  matrix  Ha  A2i  Rn  Ha  in  Eq.  2.  For  this  inverse  to 
exist,  it  is  necessary  and  sufficient  for  Aa  to  be  invertible. 

While  the  inverse  in  Eq.  B7  is  of  theoretical  interest,  it  has  no  immediate  value. 
However,  block  diagonal  inverses  of  the  above  matrix  are  the  basis  for  the  algorithms 
developed  in  this  paper.  A  rather  simple,  yet  extensive,  six-element  example  shown  in  Fig.  1 
is  given  to  illustrate  and  help  explain  the  algorithms.  Note  that  this  example  contains  no  closed 
loops  and  illustrates  both  serial  and  parallel  tree  structure.  Also,  the  system  is  described  by  a 
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graph  since  C*  -  C,T  and  thus  all  connected  elements  communicate  in  both  directions.  And 
finally,  the  coefficient  matrix  is  assumed  to  be  nonsingular.  The  following  matrices  for  this 
example  are  included  to  illustrate  the  structure  of  Eq.  1 


C, 


Cz 


Ct, 


1  0  0  0  0  0 
-1  1  0  0  0  0 
0-11000 
0-10100 
-1  0  0  0  1  0 
0  0  0  0  -1  1. 

1-1  0  0-1  o' 
0  1-1-1  0  0 

0  0  1  0  0  0 

0  0  0  1  0  0 

0  0  0  0  1  -1 

0  0  0  0  0  1 


1 

0 

0 

0 

0 

0  ' 

-  Eub 

1 

0 

0 

0 

0 

0 

*  Eiie 

1 

0 

0 

0 

0 

•  Eiie 

0 

1 

0 

0 

-  Elia 

0 

0 

0 

1 

0 

0 

0 

0 

0 

-  Em 

' . 

Ca 


I  -E 22b  0  0 

0  I  -Eac-Eza 
0  0  10 
0  0  0  1 

0  0  0  0 

0  0  0  0 


Eat  0 
0  0 
0  0 
0  o 
I  -  Ea 
0  I 


Ri-c;1 


Rz-Ci' 


Rn  ■  C,', 


1 

0 

0 

0 

0 

o' 

1 

1 

0 

0 

0 

0 

1 

1 

1 

0 

0 

0 

1 

1 

0 

1 

0 

0 

1 

0 

0 

0 

1 

0 

1 

0 

0 

0 

1 

1. 

1 

1 

1 

1 

1 

f 

0 

1 

1 

1 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

1 

1 

0 

0 

0 

0 

0 

1. 

i  0  0  0  0 

Ei  it>  I  0  0  0 

Eiicb  Eiie  I  0  0 

Ei i<*>  End  0  I  0 

Eh.  0  001 

En»»  0  0  0  Eni 


0 

0 

0 

0 

0 

I 


(B8) 


(B9) 


(B10) 


(Bll) 


(B12) 


(B 13) 


(B14) 
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I  Ega>  Ejzbo  E 22M  Eza,  Ezm 
0  I  Ejxb  Ea  0  0 


Ra» 

ci- 

0  0 

1 

0 

0  0 

0  0 

0 

1 

0  0 

0  0 

0 

0 

1  Ect 

.0  0 

0 

0 

0  1 

Ei  i<*> 

”  Eite 

Ei  ib,  etc 

Ezabc 

■  Ea  Ea<  CtC. 

r  a2U  0 

0 

0 

0  0 

0 

A2H 

1  0 

0 

0  0 

A21  - 

0 

0 

A2ie  0 

0  0 

0 

0 

0 

A,id  0  0 

0 

0 

0 

0 

A2i«  0 

0 

0 

0 

0 

0  Am  . 

r  Hia»  0 

0 

0 

0  0  ' 

0 

Hi3b 

,  0 

0 

0  0 

H13- 

0 

0 

Him  0 

0  0 

0 

0 

0 

Him  0  0 

0 

0 

0 

0 

Him  0 

0 

0 

0 

0 

0  Hi,  _ 

Hm 

0 

0 

0 

0  0 

0 

H*a» 

0 

0 

0  0 

He- 

0 

0 

H«B 

0 

0  0 

0 

0 

0 

0  0 

0 

0 

0 

0 

Him  0 

0 

0 

0 

0 

0  Ha  _ 

x,-[ 

X?. 

Xib 

-T 

*le 

xld 

xl.  X,T,]T 

Xa-[ 

*20 

*20 

xJd 

xl.  xH]T 

X3-[ 

*1. 

*3b 

*3. 

xld 

*3T.  xi]T 

bi-[ 

bi. 

bib 

b,Tc 

bid 

bi.  bl,r 

b?-[ 

bj. 

b2Tb 

b; 

bid 

bi.  bjr 

M 

b»« 

bib 

bi 

bid 

bi.  b if 

(B15) 

(B16) 

(B17) 


(B18) 


(B19) 


(B20) 

(B21) 

(B22) 

(B23) 

(B24) 

(B25) 

(B26) 
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Combining  lbe  above  equations  gives  the  composite  representation  of  Eq.  1  as 

“  Aa.  0  0  0  0  0  -1  E»  0  0  Ea.  0  0  00  0  00 

0  Aa,  0  0  00  0  -I  Eat  Eat  0  0  0  00  0  00 

0  0  Aa,  0  00  0  0-10  00  0  00  0  00 

000  Am  00000  -I  00000000 

00  0  0  Aa.  0  0  00  0  -I  E*  0  00  0  00 

00  0  0  0  Aa  0  00  0  0-10  00  0  00 

100000000000  -Ho.  0  0  0  0  0 

-Em*  I  00000000000  -H»  0000 
0  -En.  I  00000000000  -Hq,  0  0  0 

0  -Etn  0  I  00000000000  - Hq,  00 

-Em,  0001  00000000000  -Hq.  0 

0000  -E„  I  00000000000  -Hq, 

000000  Hq  00000000000 

0000000  H«k  oooooooooo 

00000000  H«,  000000000 

000000000  00000000 
OOOOOOOOOOHttoOOOOOOO 
.00000000000  Ha  000000 

(B27) 

which  could  be  symbolically  inverted  using  Eq.  B7  (and  efficiently  too  if  one  takes  maximum 
advantage  of  recursion).  However  an  optimal  block  U-L  factorization  (the  element  symbols 
can  be  kept  in  naturally  occurring  order)  of  Eq.  B27  is  being  sought,  so  first  permute  it  into  the 
following  form 

Aa,  -I  0  0  Ea,  00000000  Ea,  0  0  0  0  «<■  ba 

I  0  -Ho.  000000000000000  x»  ba 

0  Ha,  0000000000000000  („  b* 

0  0  0  A»  -I  0  0  Ea.  0  0  Eat  0  0  0  0  0  0  0  b» 

-Em,  0  0  10  -H«  000000000000  X*  b* 

0000  Ha,  0000000000000  x»  b« 

000000  A«.  -10000000000  Xi,  ba 

0  0  0  -Em.  00  10  -Hq.  000000000  x»  b„ 

0000000  Ha.  0000000000  x*  _  b* 

000000000  Aa,-«  0000000  “  b* 

0  0  0  -Em.  0  0  0  0  0  I  0  -Hq,  0  0  0  0  0  0  X,  b« 

0000000000  Hat  0000000  X*  b« 

0  0  0  0  0  0  0  0  0  0  0  0  Aa,  -I  0  0  Ea  0  Xi.  b» 

-Em.  0000000000010  -Hq,  0  0  0  x,  b* 

0000000000000  Ha,  0  0  0  0  i,  b. 

000000000000000  Aa  - 1  0  x »  ba 

0  0  0  0  00  0  0  0  0  0  0-Ew  0.010- Hi*  xa  b« 

0000000000000000  Ha  0  J[_  »  J  L  . 


Note  the  six  block  matrices 


(B28) 


A21.  - 1  0 

•  0  H13.  (.  =  a,  b . f)  (B29) 

0  H«a*  0 

on  the  diagonal  and  that  each  has  the  same  block  structure  as  Eq.  1  itself.  In  general,  if  any 
pair  of  block  matrices  is  coupled,  that  pair  will  be  coupled  by  either  one  block  matrix  above  the 
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diagonal,  or  one  block  matrix  below  the  diagonal  or  both.  In  this  example,  each  will  be 
coupled  by  both  because  the  system  is  represented  by  a  graph.  Furthermore,  every  coupling 
matrix  above  the  diagonal  has  the  same  block  structure  and  every  coupling  matrix  below  has 
the  same  structure.  The  reader  is  encouraged  to  carefully  study  the  structure  of  Eq.  B28 
because  it  is  the  key  to  the  successful  implementation  of  all  highly  efficient  recursive  solution 
algorithms.  For  block  U-L  factorization,  the  matrices  in  Eq.  B29  (or  at  least  one  equivalent 
block  matrix  at  each  stage  of  the  elimination  process)  become  the  pivotal  matrices  and  must  be 
invertible. 

Before  inverting  a  typical  matrix  in  Eq.  B29  it  will  be  instructive  to  illustrate  the 
standard  elimination  process  in  block  U-L  factorization.  Let  the  matrix  equations 

M  x  -  b  (B30) 

be  partitioned  into 


Mu  Mi  2  1  fxi  1  fbi 

Mm  Mh  J  I.X2J  [**2. 

where  block  matrix  M22  is  small,  nonsingular  and  invertible.  Thus  it  follows  that 
Xa  “  M^[bj  -  Ifc  x,] 


(B31) 


(B32) 


and 


[m,,  -  M12  Ifa]  x,  -  [b,  -  Ml*  M£  ba]  (B33) 

Generating  the  coefficient  matrix  and  the  right-hand  side  of  Eq.  B33  represents  the  first  step  of 
block  U-L  factorization.  The  second  step  treats  Eq.  B33  as  a  new  matrix  equation  where  it,  in 
turn,  is  partitioned  similar  to  Eq.  B31  and  the  process  continues  until  the  remaining  coefficient 
matrix  in  Eq.  B33  is  easily  invertible.  At  this  point,  the  preprocessing  for  elimination  is 
complete  and  the  equations  generated  by  Eq.  B32  can  be  used  for  back  substitution. 

In  this  example,  the  first  matrix  in  Eq.  B29  which  must  be  inverted  is  for  element  f. 
Since  this  matrix  has  the  same  block  structure  as  the  matrix  in  Eq.  1,  it  follows  by  using  the 
same  approach  as  in  Eq.  B7  that 


Asti 

I 

0 


- 1  0 
0  -Hia 
He  0 


HiafHia  Asm  Hia]"'  H»a  I  -  Hia[H*a  Ajit  Hia]'1  H*a  Aeit 

- 1  +  Aa«  Hi*  [Hia  Aa«  H13]'1  H*a  Ami  -  Asu  Hi*  [H*a  Asu  Hia]'1  Aen 
[H43  Aeii  H13]'1  H*a  -  [H«3  Aeit  Hr*]’1  H*a  A?it 


Ajii  H13]'' 

Ajii  Hu  [Ht2!  Asm  H13] 1 
[W*a  Aen  Hia]'' 


(B34) 


The  matrix  in  Eq.  B34  can  be  further  simplified  to 


Ajit  - 1 

0 

H13  OtM  H4U 

1  -  Hi  %  F31! 

Hl3(  D34I 

1  0 

-  Hvj 

m 

-[l  •  f2*>  H«a] 

Ajii  -  B aa  Fan 

F24, 

0  H43 

0  _ 

□341  H421 

-F31. 

D34I 

(B35) 
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by  introducing  the  identities  from  step  ROL2.1  as 


■  Hor  Am 


(B36) 


DU  “  ruai 


(B37) 


Fjit  -  D*»  B*it 


(B38) 

(B39) 


Fs«  -  Bar  Da* 


(B40) 


The  matrices  in  Eq.  B35  can  be  expressed  in  various  ways  in  terms  of  the  above  submatrices, 
and  the  particular  choices  were  made  either  to  reduce  computational  overhead,  for  use  in  other 
parts  of  the  paper,  or  arbitrarily. 

Now  the  first  computation  of  block  matrix  products  from  the  matrix  in  Eq.  B28 
corresponding  to  the  term  -  M«  M&  Mh  in  Eq,  B33  is 


f  «  »i„  Uw  H«a 
I  H42] 

L  Da*  H«r 


I  -  Hi*  F311  Hi*  Os* 
Ajii  -  B 2*  Fan  Fa4i 
-  Fan  Oyu 


0  0  0  0 
0  0  0  0 
0  0  0  0 
0  0  0  0 
0  0  0  0 
0  0  0  0 
0  0  0  0 
0  0  0  0 
0  0  0  0 
0  0  0  0 
0  0  0  0 
0  0  0  0 
0  0  0  0 
0  0  0  0 
0  0  0  0 


000000000000  0 
000000000000  *  Em 
000000000000  0 


00000000  0  00 

00000000  0  00 

00000000  0  00 

00000000  0  00 

00000000  0  00 

00000000  0  00 

00000000  0  00 

00000000  0  00 

00000000  0  00 

00000000  0  00 

00000000  0  00 

00000000  0  00 

00000000  Ez*  [Aa«  -  Bz*  Fan]  Eui  0  0 
00000000  0  00 

00000000  0  00 


(B41) 
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This  enormous  product  generated  exactly  one  nonzero  submatrix  E&  [Ae*  -  Bz*  Fan]  Em . 
When  added  to  the  remaining  submatrix  of  Eq.  B28  to  form  the  equivalent  of  Mu  -  Mi2  Min  in 
Eq.  B33,  it  falls  directly. on  top  of  the  submatrix  A«.  of  the  parent  element  e.  This  is  a 
consequence  of  optimal  ordering  or  pivotal  strategy  and  results  in  a  minimal  block  matrix  fill 
pattern  (in  this  case  zero  block  Sis).  Thus  the  entire  operation  can  be  described  by  the  single 
equation  from  step  ROL2.1 

-  Aei«  +  Eza  [Asm  -  Bar  Fnt]  Ei  u  (B42) 

The  differences  between  Eq.  B42  and  the  last  equation  in  step  ROL2.1  are  due  to  the 

initialization  of  all  A^1#  to  A**  in  step  ROL2.0  and  the  fact  that  a  given  parent  A*,„  may 

eventually  accumulate  quantities  from  more  than  one  child,  such  as  elements  a  and  b  in  this 
example.  Equation  B42  can  be  read  as  "this  step  of  the  elimination  process  is  equivalent  to 
projecting  a®i»  from  child  f  onto  parent  e  across  the  interface  between  the  two  elements."  The 
quantity  Am  -  Ba  Fw  makes  it  across  the  interface  and  then  undergoes  a  transformation 
Eza  [Am  *  Ba  Fur]  Et  it  to  match  the  coordinates  of  element  e.  The  remaining  quantity 

Httr  823  Fur  H13  -  h«3  aJ,  Hia  -  Dm  gets  projected  onto  the  element  interface  subspace  3  as  the 
coefficient  of  x»  (see  Eq.  58).  The  superscript  e  is  used  on  the  left  matrix  in  Eq.  B42  to  denote 
it  as  an  effective  quantify  because  the  original  was  modified  by  the  projection  process. 


Now  the  equivalent  of  computing  the  quantity  -  M12  bz  on  the  right  hand  side  of 
Eq.  B28  follows  as 


0 

0 

0  - 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Hi*  D*4»  H«3 

1  -  Hi*  Fjn 

'ba' 

0 

0 

0 

-[1  -  Fz«i  H«f] 

Azn  -  Bza  Fj« 

Fz4» 

bn 

0 

0 

0 

Dm  H*a 

-  Fjn 

D34J 

b«f 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Eza 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Eza  [[l  -  Fa*  H»a]  [ba  -  Aju  bn]  -  F»i  b«] 
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(B43) 

As  above,  this  enormous  product  also  generates  exactly  one  nonzero  subvector, 
Eza[[l  -  F24*H4a](ba- An  b«]  -  F»»  b«].  When  added  to  the  remaining  subvector  of  Eq.  B28  to 

form  the  equivalent  of  bi  -MizM^bz  in  Eq.  B33,  the  single  term  falls  directly  on  top  of  the 
subvector  ba  of  the  parent  e.  This  again  is  a  consequence  of  the  optimal  ordering  strategy  and 
the  entire  operation  is  described  by  the  following  two  equations  from  step  ROL3. 1 


x*  ■  Bufb*  +  H«si  ba]  -  ftit  b* 


(B44) 


and 


bi»bai+Eaai[bar-Aeif[bit  +  Hiax5]]  (B45) 

The  reader  can  verify  by  substitution  that  these  two  equations  are  equivalent  to  the  right  hand 
side  of  Eq.  B33.  The  intermediate  step  was  introduced  to  reduce  the  computations  necessary 
for  step  ROL4.1  of  the  algorithm.  Again,  the  differences  between  Eq.  B45  and  the  second 

equation  in  step  ROL3.1  are  due  to  the  initialization  b|  -ba  in  step  ROL3.0  and  the  need  to 

accumulate  projected  quantities  from  children  into  parent  b**.  As  above,  Eqs.  B44  and  B45 

can  be  read  as  "this  step  of  the  elimination  process  is  equivalent  to  projecting  the  right  hand 
side  quantities  from  child  f  onto  parent  e  across  the  communicating  interface  between  the  two 
elements."  The  quantity  [I  •  Fa«  H4a][ba  -  Aan  bn]  -  Fa«  b*  makes  it  across  the  interface  and  then 
undergoes  a  transformation  E&  Jl  -  Fa*  H43]  [ba  -  A^,  b,(]  -  F2m  b«]  to  match  the  coordinates  of 
element  e.  And  as  above,  the  e  superscripts  in  Eqs.  B44  and  B45  denote  these  terms  as 
effective  quantities. 


Now  that  these  computations  have  been  completed,  the  reduced  matrix  corresponding 
to  Eq.  B33  which  fits  the  mold  of  Eq.  B30,  can  again  be  partitioned  according  to  Eq.  B31  as 
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(B46) 

Element  f  and  all  quantities  associated  with  it  have  been  completely  eliminated  from  the  reduced 

system  of  equations  with  the  introduction  of  the  two  modified  terms  A|„  and  b|,  in  Eq.  B46. 
This  new  equation  indicates  that  child  e  is  connected  to  parent  a  so  the  next  elimination  step  will 
project  the  modified  child  e  onto  its  parent  a.  These  steps  are  summarized  as  follows 
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and 


*1  -  Q**[b«*  +  H*a*  b^,]  -  Fai.  bi* 


bi  »  bai+  Eza*[bi  -  Aj,[  bi*  +  Him  *i]] 


(B55) 

(B56) 

(B57) 


Now  the  new  reduced  matrix  corresponding  to  Eq.  B33  takes  the  following  form 
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(B58) 


Elements  e  and  f  have  been  completely  eliminated  and  only  the  two  new  terms  Aj,  and 

bi  were  generated.  For  reference,  the  remaining  reduced  matrices  are  shown  without  the 
intermediate  symbolic  steps 
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(B59) 


(B60) 


L  0  0  j«-  -  J  L  ba  J  (B61) 

At  this  point,  one  might  envision,  as  each  additional  element  is  eliminated,  that  the 
recursive  block  elimination  process  is  like  folding  or  collapsing  one  leaf  or  child  element  at  a 
time  onto  its  parent  to  form  new  equivalent  leaves  or  to  completely  eliminate  branches. 
Eventually  only  a  single  equivalent  leaf  (or  the  equivalent  root)  remains,  namely  Eq.  B61  in 
this  example.  Now  the  matrix  in  Eq.  B61  can  be  inverted  yielding 

r  „  i  f  Hiaa  Dm  KUsa  I  -  H13,  F31,  Hi 3*  If  bja  1 


(B62) 


*2»  ”  '[I  -  F24«  He,]  A£1t  -  B&a  F31,  F24. 

*3*  Du**  He,  -  F31,  Dw, 


If  one  looks  back  to  Eq.  1  or  Eqs.  5  and  6,  it  will  be  apparent  that  the  components  of  x,a  and  x2a 
can  be  more  efficiently  obtained  from 

C11  Xi  ■  bi  +  H13  X3  (la) 


C a  Xj  -  -[bz  •  Aji  Xi] 


x,  ■  R11  [bi  +  Hia  X3] 
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and 


X2«-Ra[b2-Aai  Xi] 


(6) 


Thus  only  the  last  equation  from  Eq.  B62  is  required  from  this  step  giving 

Xa«  -  Dsa  [bu,  +  H«i  b£,]  -  Fai,  bi*  (B63) 

which  is  just  the  final  step  ROL3.2  of  the  recursive  algorithm.  Now,  substituting  this  result 
into  the  first  equation  of  Eq.  la  gives  the  first  step  in  ROL4  of  the  recursive  algorithm 


Xi,  ■  bit  +  Hl3aX3a  (B64) 

To  solve  for  the  remaining  unknowns  requires  a  visit  to  Eq.  B32,  that  is 

xa-M^Oz-Mji  xi],  for  the  back  (forward  in  this  example)  substitution  step  of  the  block  U-L 
factorization  algorithm.  The  first  step  is  complete  with  Eqs.  B63  and  B64.  The  next  step  uses 
the  partitioned  Eq.  B60  to  form,  according  to  Eq.  B32 
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which  yields  the  necessary  equation 


x»-  DMb[bw,+  FUab^,]  -  Pjibbib  -  Fjtb  Ei«>Xi«  (B66) 

However,  if  one  refers  back,  for  example,  to  Eq.  B44  with  b  substituted  for  f,  it  follows  that 
the  term 


■  Qj«b[b«b  +  H«a  b4]  -  Fsi&bm 


(B67) 


was  computed  earlier  in  the  recursive  elimination  steps  so  Eq.  B66  can  be  modified  as 


x*»  ■  xj,  -  Fjib(Eiib  Xt«] 


(B68) 


Now,  the  second  block  equation  from  Eq.  la  yields 


Xib-bib  +  [EiibXia]  +  Hi3bX3b  (B69) 

Equation  B66  defines  the  first  step  ROL4  and  Eqs.  B68  and  B69  define  the  equations  for 
recursive  steps  ROL4.1.  One  more  step  will  be  developed  to  more  clearly  illustrate  this  part  of 
the  algorithm.  Starting  with  the  reduced  equation,  Eq.  B59,  the  next  substitution  step  based  on 
Eq.  32  is 
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which  yields 

X3e-x£-F»ic[Ei,cXi#]  (B71) 

where 

*3«  “  D*c[b*  +  Hiao  b»]  *  Fjic  bi0  (B72) 

was  evaluated  during  the  elimination  step.  Finally,  die  remaining  equation 

Xic  ■  bic  +  [Eiic  Xib]  +  HiaeXae  (B73) 

comes  from  the  third  block  equation  of  Eq.  la.  This  process  essentially  repeats  in  the  reverse 
order  of  the  elimination  steps  until  all  of  the  leaves  have  been  visited.  Analogous  to  the  earlier 
discussions  and  noting  that  the  matrix  equations  at  each  step  of  the  substitution  expand  to 
include  one  additional  element,  one  can  envision  substitution  as  equivalent  to  unfolding  the 
previously  collapsed  tree  one  leaf  at  a  time  in  the  reverse  order  of  its  folding  until  the  tree  has 
been  completely  returned  to  its  original  configuration. 

Note  that  the  above  steps  did  not  evaluate  the  unknown  quantities  x2.  If  they  are 
desired,  the  remaining  step  ROL5  based  on  Eq.  lb  or  Eq.  6  can  be  used  to  compute  them. 

Finally,  the  natural  symbolic  factors  of  the  generalized  matrix  in  step  POL3  are 
evaluated  to  give  a  better  understanding  of  the  equation  structure  in  the  POL  algorithm. 
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and 
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ABSTRACT 

A  general  symbolic-based  method  is  presented  for  solving  equations  of  motion  for 
open-loop  kinematic  chains  consisting  of  interconnected  rigid  and  deformable  bodies.  The 
method  utilizes  matrix  partitioning,  recursive  projection  based  on  optimal  block  U-L 
factorization  and  generalized  Newton-Euler  equations  to  obtain  an  order  n  solution  for  the 
constrained  equations  of  motion.  Kinematic  relationships  between  the  absolute  reference,  joint 
and  elastic  coordinates  are  used  with  the  generalized  Newton-Euler  equations  for  deformable 
bodies  to  obtain  a  large,  loosely  coupled  system  of  equations.  Taking  advantage  of  the  inertia 
matrix  structure  associated  with  elastic  coordinates  yields  a  recursive  solution  algorithm  whose 
dimension  is  independent  of  the  elastic  degrees  of  freedom.  The  above  solution  techniques 
applied  to  this  system  of  equations  yield  a  much  smaller  operations  count  and  can  more 
effectively  exploit  vectoiization  and  parallel  processing.  The  algorithms  presented  in  this  paper 
are  illustrated  with  the  aid  of  cylindrical  joints  which  are  easily  extended  to  revolute,  prismatic, 
rigid  and  other  joint  types. 

1 .  INTRODUCTION 

Various  techniques  for  the  dynamic  analysis  of  constrained  mechanical  systems 
consisting  of  interconnected  rigid  and  deformable  bodies  have  been  reported  in  the  literature. 
The  resulting  algorithms  can  be  roughly  divided  into  two  main  categories  depending  on  the  set 
of  coordinates  used  to  derive  the  kinematic  and  dynamic  equations.  The  first  category  employs 
relative  joint  coordinates,  eliminates  constraint  reaction  forces  and  yields  the  smallest,  most 
strongly  coupled  system  of  equations.  Absolute  coordinates  and  joint  reaction  forces  are  used 
to  formulate  the  dynamic  equations  of  motion  in  the  second  category.  This  approach  yields 
relatively  large,  moderately  coupled  systems  of  equations.  However,  the  exclusive  use  of 
absolute  coordinates  introduces  complexities  in  implementing  control  algorithms,  because  the 
joint  variables  are  not  readily  available  when  solving  the  equations  of  motion.  Furthermore, 
many  of  the  algorithms  implementing  this  approach  require  the  use  of  Newton- Raphson 
iteration  to  correct  for  constraint  violations. 

Multibody  mechanical  system  algorithms  generally  employ  joint  models  defining 
topological  networks  of  coupled  equations  which  must  be  solved  by  matrix  and  numerical 
methods.  Featherstone  [1]  presented  a  method  for  calculating  the  acceleration  of  a  robot  in 
response  to  given  actuator  forces.  His  method  is  applicable  to  open-loop  chains  containing 
rigid  bodies,  and  revolute  and  prismatic  joints.  In  this  work,  he  developed  an  algorithm  based 
on  recursive  formulas  involving  quantities  called  articulated-body  inertias  which  represent  the 
effective  inertia  properties  of  multiple  rigid  bodies.  Wehage  [2-4]  extended  and  generalized 
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Featherstone's  algorithm  by  developing  a  general  method  for  obtaining  an  order  n  solution  for 
arbitrary  constrained  equations  of  motion  by  applying  matrix  partitioning  and  recursive 
projection  techniques.  He  also  showed  that  the  recursive  algorithms  are  essentially  the  result  of 
optimal  block  U-L  factorization  applied  to  the  composite  inertia  coefficient  matrix.  The  joint 
kinematics,  equations  of  motion  and  topology  of  a  mechanical  system  are  represented  in 
factored  matrix  form  resulting  in  a  large  system  of  loosely  coupled  equations  amenable  to 
sparse  matrix  manipulation.  Optimal  matrix  permutation,  partitioning  and  recursive  projection 
techniques  are  then  applied  to  symbolically  unravel  and  lay  out  an  order  n  solution  strategy 
which  follows  the  natural  topological  protile  of  the  system.  The  method  can  be  applied  to 
arbitrary  open  and  closed-loop  systems  in  order  to  generate  the  necessary  uncoupled  equations 
[4]. 


In  an  earlier  work,  Armstrong  [5]  developed  a  recursive  inertia  projection  algorithm  for 
robotic  systems  composed  of  spherical  joints.  Stepanenko  and  Vukobratovic  [6]  gave  explicit 
procedures  for  computer  generation  and  integration  of  the  equations  of  motion  using  Newton- 
Euler  equations.  Orin  [7]  proposed  a  number  of  improvements  on  the  scheme  of  Stepanenko. 
The  success  of  Newton-Euler  equations  applied  to  recursive  robotic  manipulator  dynamics  is 
attributed  to  their  simplicity,  and  the  ability  to  express  them  in  closed  form.  A  typical  set  of 
recursive  kinematic  equations  can  be  obtained  by  starting  at  an  arbitrary  link  at  the  end  of  the 
kinematic  tree  and  moving  inward  toward  the  base.  These  kinematic  relationships  along  with 
the  Newton-Euler  equations  yield,  by  simple  matrix  products,  a  compact  set  of  symbolic 
equations  in  terms  of  the  joint  variables. 

In  this  paper,  a  general  symbolic-based  method  is  developed  for  solving  the  equations 
of  motion  for  mechanical  systems  consisting  of  interconnected  rigid  and  deformable  bodies. 
The  method  utilizes  matrix  partitioning,  recursive  projection  [2-4]  and  generalized  Newton- 
Euler  equations  [8].  The  absolute  or  reference  coordinates  of  each  deformable  body  in  the 
system  are  expressed  in  terms  of  body  joint  and  elastic  coordinates.  The  resulting  equations  of 
motion  employing  absolute  coordinates  and  based  on  the  above-mentioned  generalized 
Newton-Euler  equations,  contain  the  nonlinear  inertia  coupling  between  the  so-called  rigid 
body  or  reference  motion  and  the  small  elastic  deformations.  A  significant  portion  of  these 
equations  can  be  expressed  in  terms  of  time-invariant  quantities  which  depend  on  the  assumed 
displacement  field.  The  kinematic  relationships  and  generalized  Newton-Euler  equations  yield 
a  large  system  of  loosely  coupled  equations  amenable  to  sparse  matrix  manipulation.  Direct 
methods  employing  optimal  numerical  block  U-L  factorization  for  manipulating  sparse  matrices 
[9, 10]  have  been  successfully  applied  to  equations  of  this  type,  but  the  overhead  of  numerical 
matrix  structure  analysis  can  be  excessive.  This  problem  is  circumvented  here  by  employing 
optimal  symbolic  U-L  factorization  to  develop  equations  which  recursively  yield  the  absolute 
and  relative  accelerations,  and  the  joint  reaction  forces.  This  method  requires  the  inversion  or 
decomposition  of  relatively  small  matrices  and  the  numerical  integration  of  a  minimum  number 
of  coordinates.  In  those  algorithms  which  use  absolute  coordinates  exclusively,  Newton- 
Raphson  iteration  is  often  employed  to  correct  for  constraint  violations.  This  technique 
generally  leads  to  numerical  and  convergence  problems.  The  method  in  this  paper  avoids  the 
use  of  Newton-Raphson  iteration  and  can  easily  be  implemented  on  the  digital  computer. 

2.  RECURSIVE  KINEMATIC  EQUATIONS 

Figure  1  shows  two  deformable  bodies  labeled  i- 1  and  i,  and  connected  by  a  cylindrical 

joint.  Reference  coordinate  systems  and  X'Y'Z'  with  origins  O1'1  and  O'  are 

introduced  to  define  absolute  displacement  relative  to  a  global  frame.  Let  global  reference 

position  vectors  R'*1  and  R1  locate  the  respective  origins.  For  convenience  in  describing  the 


528 


connecting  joint,  introduce  intermediate  body- fixed  joint  coordinate  systems  1  and 

XpY'pZj,  at  the  joint  definition  points  as  also  shown  in  Fig.  1.  These  intermediate  joint 
coordinate  systems  are  assumed  to  experience  small  displacements  (due  to  body  deformation) 
with  respect  to  the  reference  and  other  coordinate  systems  fixed  on  the  same  body.  Large 
relative  displacements  between  coordinate  systems  on  different  bodies  (due  to  joint 

displacements)  are  allowed  and  are  described  using  joint  variables  0l,t  l  (rotation)  and  t1'1’1 
(translation). 


Figure  1.  Intermediate  coordinate  systems 


Vectors  and  matrices  can  be  represented  in  any  coordinate  system  and  throughout  this 
paper,  it  will  be  convenient  to  express  them  in  body  reference  and  global  coordinates. 
Symbols  with  overbar  will  denote  quantities  expressed  in  global  coordinates,  otherwise  body 

reference.  Let  orthonormal  matrix  A‘  relate  global  and  body  reference  coordinate  systems  and 

vector  coordinates  as  a1  =  A1  a1  and  a1  =  AlTa*.  In  this  paper,  a  given  vector  or  matrix 
associated  with  body  i  will  only  be  expressed  in  the  above  two  coordinate  systems,  so 
additional  notation  will  not  be  required.  The  kinematic  equations  are  initially  derived  in  global 
coordinates  and  then  transformed  to  body  reference  coordinates.  In  general,  bold  lower  and 
upper  case  letters  denote  respective  algebraic  representation  of  vectors  and  matrices.  The 

symbols  o>,  a  and  y  also  denote  algebraic  vector  quantities. 
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Angular  Velocity  The  absolute  angular  velocity  of  body  i  reference  coordinate  system  can 
be  expressed  in  terms  of  body  i-1  as  [8] 

— i  — i-l  — i-l  — i  ’  i.i-1  — i 

m  =  G>  +  (hod  +  vdp  e  +  OJpo 

_ i  _ 4.  J 

where  to  and  0)  are  the  respective  angular  velocity  vectors  of  body  i  and  i-l  reference 
coordinate  systems  relative  to  a  global  inertial  frame,  co^1  is  the  intermediate  angular  velocity  of 

joint  coordinate  system  1  with  respect  to  body  i-l  reference  coordinate  system, 

is  the  intermediate  angular  velocity  of  body  i  reference  coordinate  system  X'Y'Z'  with  respect 

to  the  joint  coordinate  system  XpY*pZ^  and  v^p  =  A1  vdp  is  a  unit  vector  lying  along  the  joint 

axis  of  rotation/translation.  The  angular  velocity  vectors  co^1  and  coj,0  are  the  result  of  small 
joint  coordinate  system  rotations  with  respect  to  the  corresponding  body  reference  coordinate 
systems  due  to  body  deformations.  It  is  more  efficient  to  work  in  body  reference  coordinate 
systems  because  many  of  the  vector  and  matrix  quantities  will  be  constant  In  addition, 
quantities  in  the  body  i-l  and  i  reference  coordinate  systems  can  be  related  by  the  matrix 

A W*1  =  A,t  A1'1  or  through  the  basic  identity 


am*1  =  [i  +  v^p  sin  Qu'1  +  2  v^p  sin 


ui2(eM'' 


L 

2/J  Ao 


where  Ajj1’1  is  a  constant  transformation  matrix  corresponding  to  the  condition  01'1’1  =  0  [8]. 

A  skew-symmetric  matrix  a  equivalent  of  a  vector  cross  product  operator  is  associated  with  a 
[8].  Thus  Eq.  1  may  be  expressed  in  body  i-l  and  i  reference  coordinates  as 


i  .  ;  ;  1  [  i-l  i-l]  i  A*’*'1  i 
isA1*1-1!©  +ajodJ  +  vdpe  +a>p, 


The  intermediate  angular  velocity  vectors  written  in  terms  of  body  i-l  and  i  elastic  coordinates 
are  simply 

i-l  ei-l  -i-l 


1  c1  '  1 
G)po=  S0po  <lf 


Constant  matrices  and  Sgpo  as  defined  by  Changizi  and  Shabana  [11],  depend  only  on  the 

body  shape  functions  and  the  relative  location  of  points  p  and  d  on  the  bodies.  Vectors  %  1 

and  <if  are  the  respective  body  i-l  and  i  elastic  coordinate  derivatives.  Throughout  this  paper, 
subscripts  r,  j  and  f  will  denote  respective  reference,  joint  and  flexible  or  elastic  coordinates. 


Angular  Acceleration  Similar  to  the  above  development,  one  can  write  the  body  i  reference 
angular  acceleration  vector  in  terms  of  body  i-1  as 


— i  — i-1  ••i.i-1  .  ,  _ 

a  =  a  +vip0  +  A1*1  S 


i-1 
Sod 


i‘1  +  A*S‘ 


8po 


‘t  +  % 


(5) 


-i-1 


where  a  and  a  are  the  respective  angular  acceleration  vectors  of  body  reference  coordinate 
systems  X^Y^Z1'1  and  XTO.  In  addition 


-•  -a  •  i.i-1  jsH 

Y0l  *  Vdp  0  +  CO  A1-1  S 


i-l  •  i-1  7T  A  i  •  1 

<If  +£aA1S0poqf 


Sod 


(6) 


absorbs  components  of  angular  acceleration  which  are  quadratic  in  first  derivatives.  The 
reference  accelerations  are  with  respect  to  the  global  inertial  frame.  Equations  5  and  6  may  be 
expressed  in  body  i-1  and  i  reference  coordinates  as 


1  =  A'**'1 


a 


i-'  +  S 


Sod 


:  • -i.i-1 

+  vip0 


(7) 


Ye 


i  =  A‘-i-1 


•  i-1 

qr 


•u-i 

,  0  +  a>  S 


9po 


•  l 

Qf 


or  more  compactly  as 


(8) 


a 


‘sA*’*'1 


+h;p 


;i 


+  Ye* 


where 


“i.i-1 

x 


••i.i-1 

0 


and 


(9) 


(10) 


(11) 


Matrix  H0po  is  often  called  an  influence  coefficient  matrix.  Symbols  I  and  0  used  in  various 

matrix  expressions  refer  to  respective  identity  and  zero  matrices  whose  dimensions  are  implied 
by  the  accompanying  matrices  and  vectors. 
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Linear  acceleration  of  body  reference  origin  The  linear  acceleration  of  body  i 
reference  origin  (>  can  be  written  in  matrix  form  as 


"•i-1  — i-1  — i-1  — 

=  R  +  a  +  co 

— i  “U-l  —  ‘ 

+  Vdp  t  +  to  ©  u 


t  «i-l 

0)  Ujri  +  2  Q)  Uod  +  Uod 

1  — i  — i  "* 

po+  a  upo  +  2  CO  Upo  +  Upo 


(12) 


where  is  the  position  vector  of  joint  coordinate  system  1  with  respect  to  O*'1  and 


iTp0  is  the  position  vector  of  reference  coordinate  system  X'Y’Z*  with  respect  to  intermediate 


joint  coordinate  system  X^Y*^.  Vectors  u^1  and  upo  can  be  expressed  in  terms  of  respective 
body  i-1  and  i  elastic  coordinates.  In  body  reference  coordinates,  Eq.  12  becomes 


1  —i-1  :  ,  — i-I  —i-1  j  ,  —i-1  .  j_i  .  .j_l 

R  1  +  a  u^1  +  co  to  uj  +  2  to  +  uod 


+  vdP* 


i.i-1  -i  :  ~i~i  •  ^  .  i 

+  Upo  +  ©  CO  Up<J  +  2  CO  Upo  +  Upo 


(13) 


With  this  knowledge  and  using  Eqs.  1  and  9,  Eq.  13  can  be  written  more  compactly  and  in 
body  i  reference  coordinates  as 


8‘  +  up<>  a  *  =  Ai,M  IK'"1-  uj  ai->  +  S& iff1]  +  Hr 


roP'  +  Tr 


where 


(14) 


(15) 


^  ]  ... 

is  also  an  influence  coefficient  matrix  and  YR  absorbs  acceleration  components  which  are 
quadratic  in  first  derivatives. 

3.  KINEMATIC  MATRIX  EQUATIONS 


The  first  step  toward  developing  recursive  kinematic  relationships  is  to  express  the 
second  derivatives  of  body  i  coordinates  explicitly  in  terms  of  those  of  body  i-1.  To  do  this, 
first  combine  Eqs.  9  and  14  in  matrix  form  as 


I 

0 


I 


a 


Ai.i-1  -  A1,1"1  u^j1 
0  A'**"1 


i,i-l 

1  3  Rod 


Ay-lS 


i-1 

eod 


Ri_1 

i-1 

a 

+ 

H'rpo 

-  --i-i 

.<0. 

L  9f  j 

(16) 
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Then  multiply  by  the  coefficient  matrix  inverse 


and  append  the  identity 


where 


Hf =[ 0  0  I ] 

Finally,  Eq.  19  may  be  written  more  compactly  as 
a‘ =  Hj*'1  a*'1  +  Hp  p'  +  y 


(25) 

(26) 
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(27) 

(28) 

(29) 

(30) 

(31) 

(32) 


Formulations  for  revolute,  prismatic  and  rigid 


joint  kinematic  equations,  as  well  as  more  sophisticated  joint  types  [12]  can  be  obtained  as 
special  cases  of  the  cylindrical  joint  equations.  For  respective  revolute,  prismatic  or  rigid 


.  .  i.i-l  rti,i-l  .  i_  i.i- 1  . 

joints,  x  ,  0  or  both  x  and  0 


are  constant 


4 .  GENERALIZED  NEWTON-EULER  EQUATIONS  OF  MOTION 


Recently,  several  formulations  have  been  developed  for  the  dynamic  analysis  of 
deformable  bodies  undergoing  large  rotations.  In  this  paper,  the  generalized  Newton-Euler 
equations  accounting  for  all  inertia  coupling  between  the  reference  motion  and  elastic 
deformations  are  used  The  generalized  Newton-Euler  equations  presented  by  [8]  in  terms  of 
absolute  reference  and  deformation  coordinates  are  given  for  deformable  body  i  in  its  reference 
frame  as 
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(33) 


M‘rr  Ke  M*f  I  8‘ 

<  a' 

<  <  l» 


where 


^rr  =  m1 1 


Ka  =  “c* 


Mkf-fp^jdV* 

^  (36) 

M00=[  P^^dV1 

^  (37) 

<=[  P^SfdV* 

^  (38) 

i<=[  p^SjdV* 

^  (39) 

and  m‘,  p‘,  V*  and  Sf  are,  respectively,  body  i  total  mass,  mass  density,  volume  and  shape 

function.  Vector  ul  =  u),  +  iif  defines  the  position  of  any  arbitrary  point  on  the  deformable 

body  where  u„  represents  the  undeformed  position  of  that  point  and  Uf  =  Sj  qj  gives  the 

displacement  of  the  point  from  its  undeformed  position.  The  effective  mass  moment  relative  to 
the  body  reference  frame  is 


Hp‘ul 


dV*  +  M‘Rf  q‘f 


The  coefficient  matrix  in  Eq.  33  is  symmetric  and  assumed  positive  definite.  The  right  hand 
side  vectors  g^,  g'  and  gj-  contain  externally  applied  forces  and  moments,  internal  elastic  and 
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damping  forces  and  components  of  inertia  forces  which  are  quadratic  in  first  derivatives  of  the 
coordinates  [8, 13]. 

Using  Eq.  33  and  assuming  for  this  discussion  that  body  i  is  at  the  end  of  a  chain  of 
elements  so  that  it  contains  only  one  joint  common  to  it  and  body  i-1,  the  generalized  Newton- 
Euler  equations  can  be  written  as 


Miai=gi  +  fi 

(41) 

where 

g1  = 

r  „iT  „iT 

[  8r  Sq 

*;T  f=[*;T 

«r]r 

(42) 

and  the  vector 

P  = 

fiT  f*T 

‘R  *8 

ii  ^ 

r;TF 

(43) 

contains  the  internal  reaction  forces  at  the  joint  interface  between  the  two  bodies  [2,  8,  13]. 
Using  matrix  Hp  from  Eq.  30,  one  can  write  the  following  equation 

HpTfi  =  Q‘  (44) 


where 


Q‘=[  q;t  ot  f 


(45) 


contains  the  vector  of  joint  generalized  forces  acting  parallel  or  tangent  to  the  constraint  surface. 
The  second  equation  resulting  from  Eqs.  44  and  45  yields  the  dynamic  force  balance  relation 


(46) 


Note  that  the  last  part  of  Ql  corresponding  to  elastic  generalized  coordinates  is  zero  because 

all  elastic  generalized  forces  were  included  in  gj,  This  arbitrary  choice  was  made  to  simplify 
Eq.  46  and  will  yield  the  same  result  as  is  evident  from  Eq.  41.  Examples  of  joint  generalized 
forces  arc  actuator  forces  in  prismatic  joints,  motor  torques  in  revolute  joints  and  friction  forces 
and  torques  in  joints. 

5.  SPARSE  MATRIX  FORMULATION 

In  this  section,  a  sparse  matrix  oriented  technique  for  solving  the  kinematic  and  force 
relationships  of  the  preceding  sections  is  developed.  For  example,  Duff,  et  al.  [10]  have 
shown  that  optimal  block  permutation  can  minimize  block  matrix  fill  in  U-L  factorization  which 
is  equivalent  to  minimizing  computational  overhead.  The  purpose  of  the  remaining  sections  in 
this  paper  is  to  establish  patterns  which  will  be  applicable  to  recursive  solution  of  multibody 
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systems  with  arbitrary  number  of  bodies.  First,  Eqs.  26, 41  and  44  can  be  combined  in  matrix 
form  to  obtain  a  large,  sparse  system  of  equations  in  terms  of  the  absolute  and  joint  coordinates 
as 


in  Eq.  47  and  noting  that 


(53) 


‘  M‘ 

-I  ' 

-l 

0  I 

I 

0 

.  -I  M1 

it  follows  that 


M* 

-I 

0 

-1 

G‘  PlT 

E* 

I 

0 

X 
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-F 

p 

0 

< 

0 

ElT  -  FiT 

D* 

where 

D‘  =  «  M'Hp]"1 

E‘  =  Hp  D‘ 

F‘  =  M*  E* 
G'sE'H'J 

P‘  =  I  -  F* 

=  P*  M* 


a1 

G'  PiT 

E1 

g* 

r 

= 

-pi  Mp 

Fi 

y  +  H*’1*1  a1-1 

p'. 

ElT  -  FiT 

Di 

Qj 

(54) 

(55) 

(56) 

(57) 

(58) 

(59) 

(60) 


(61) 


Matrix  P1  is  a  projection  matrix  and  is  a  projected  inertia. 

Assuming  that  a1'1  is  known,  then  Eq.  61  will  yield  a\  f‘  and  p\  However,  the  first 
two  equations  from  Eq.  61  are  not  required  because  it  is  less  expensive  to  obtain  a‘  and  P 
directly  from  Eqs.  26  and  41  once  p1  is  known. 
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6.  ALTERNATE  MATRIX  REPRESENTATION 

As  pointed  out  earlier,  Eq.  47  yields  the  largest  and  least  coupled  system  of  equations, 
where  advantage  is  taken  of  the  identities  in  Eqs.  48-60.  However,  some  of  the  intermediate 
matrix  operations  involved  in  evaluating  the  inverse  in  Eq.  54  are  still  quite  large.  It  is  possible 

to  obtain  a  smaller,  less  sparse  equation  system  by  eliminating  the  second  appearance  of  cjf 

from  p‘  on  the  left  side  of  Eq.  47  (see  Eqs.  10, 18, 25,  30-32, 47  and  61).  To  this  end,  write 
the  projecnon 


"I  0  0  0  0  ‘ 
0  10  0  0 
0  0  10  0 
0  0  0  1  0 
0  0  0  0  1 
0  10  0  0 


K 

If 


(62) 


Substituting  Eq.  62  into  Eq.  47,  premultiplying  by  the  transpose  of  the  coefficient  matrix  in 
Eq.  62,  eliminating  the  resulting  null  block  row  and  column  and  permuting  gives  the  following 
reduced  set  of  equations 
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'af 
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-hv 
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HjTj 
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HiT 
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where 


(63) 


(64) 


K 
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(65) 
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Observe  that  the  upper  left  3  by  3  block  of  matrices  in  Eq.  63  has  dimension  12  plus  the  degree 
of  freedom  in  the  joint  connecting  bodies  i-1  and  i,  and  is  nonsingular  for  well  posed 
problems.  This  matrix  is  easily  inverted  following  the  steps  outlined  in  Eqs.  48-60. 

Furthermore  the  lower  matrix  Mg  is  constant,  positive  definite  and  thus  all  quantities  can  be 

evaluated  when  a^'1  is  known.  In  addition,  note  that  the  unknown  constraint  deformation  force 

vector  f[  has  been  eliminated  from  Eq.  63  but  can  be  evaluated,  if  desired,  from  Eq.  46  once  f) 
has  been  determined.  Use  the  last  equation  in  Eq.  63  to  solve  for 


540 


Then  with  matrix  M*  positive  definite  and  with  full  column  rank  in  Hj^,  it  can  be  shown  [14] 

ihat  matrices  and  are  positive  definite  which  guarantees  nonsingularity  of  A.  With 
this,  it  is  straight  forward  to  evaluate  a  set  of  matrices  similar  to  Eqs.  55-60  which  represent 
the  inverse  of  the  coefficient  matrix  in  Eq.  67  as 


'  < 

-MSo 

0 

-1 

G1  P,T 

IT  TT 
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<0 
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1 
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= 
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D^ 

n  J 

(71) 


Similar  to  Eq.  61,  it  follows  that 


■j 

'  <  p,I 

gj-MjjMW-1’*; 

= 

-  K  K 

i  +  »“•'  a|-‘  +  H“-‘  q|-'  +  Hjrf  g‘ 

,pj_ 

piT  piT 
-  E«j  *Fti 

dL 

q; 

where  all  three  equations  must  be  used  here  because  the  first  two  in  Eq.  63  depend  on  q} 

which,  according  to  Eq.  66  also  depends  on  a)  and  f*.  One  might  suggest  that  this  problem 

could  have  been  avoided  by  first  eliminating  aj,  f‘  and  p‘  from  Eq.  63.  However,  this  idea 
was  discarded  because  it  requires  the  repetitive  inversion  of  a  much  larger  matrix  the  size  of 

in  order  to  evaluate  q|  (recall  that  matrix  Mjj  ^  is  constant  and  must  be  evaluated  only 
once). 

7 .  CONNECTIVITY  CONDITIONS  AND  PROJECTION  METHODS 

Let  body  i-1  be  located  between  bodies  i-2  and  i  in  a  chain  of  elements.  Then  a 
dynamic  equilibrium  equation  similar  to  Eq.  41  can  be  written  as 

Mi-1ai-1  =  gi-1+fi-1-H,a’i‘1Tfi  (?3) 


•  * 

where  the  transformation  Ha  l*  brings  the  reaction  forces  at  the  joint  between  bodies  i- 1  and  i 
to  the  common  reference  coordinates  and  origin  of  body  i-1.  The  dynamic  equilibrium  of 
bodies  i  and  i-1  taken  together  is  described  by  Eqs  41  and  73  combined  in  matrix  form  as 


'  M*‘l 
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a1'1 
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fi-i 

|wi,L-lT 
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(74) 


Likewise  one  may  extend  Eqs.  26  and  44  to  obtain 


Now  Eqs.  74-76  may  be  combined  and  permuted  into 


(77) 

Following  the  steps  leading  to  Eqs.  62  and  63,  Eq.  77  can  also  be  reduced  to 


(78) 
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The  reader  is  encouraged  to  carefully  study  the  structure  of  Eqs.  77  and  78  to  identify 
the  minimal  coupling  between  the  two  major  blocks.  These  equations  may  be  extended  to  any 
number  of  bodies  by  adding  blocks  along  the  major  diagonal  and  corresponding  coupling 
matrices  at  the  row/column  intersections  corresponding  to  their  joint  adjacency.  For  serial 
mechanisms,  the  overall  matrix  bandwidth  will  always  be  the  same  as  in  Eqs.  77  and  78. 
Regardless  of  the  matrix  bandwidth  (the  degree  of  system  serialism  or  parallelism),  the 
computational  overhead  per  body  for  open-loop  systems  will  always  be  the  same  when  the 
matrix  equations  have  been  permuted  for  optimal  U-L  factorization.  That  is,  elimination  starts 
in  the  lower  right  hand  comer  and  back  substitution  starts  in  the  upper  left  hand  comer  of  the 
composite  matrix.  To  further  comprehend  the  recursive  elimination  procedure,  Eq.  78  is 
solved  for  all  unknown  quantities.  This  procedure  can  then  be  extended  to  any  number  of 
bodies. 


Since  matrices  Mg  and  Mg1  are  constant  and  assumed  nonsingular,  the  accelerations 

ijf"1  and  q’f  are  first  eliminated  non  recursively,  as  in  Eq.  66,  leaving  a  system  of  equations 
with  structure  similar  to  those  involving  only  rigid  bodies  (refer  to  Eq.  67).  To  this  end, 

eliminate  <jf' 1  and  q|  giving 
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where 
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The  remaining  submatrices  are  obtained  from  Eqs.  68-70  with  i  replaced  by  i-1. 
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Eliminating  the  unknowns  aj,  f  ‘  and  p]  following  the  procedures  outlined  earlier  yields 
the  reduced  system  of  equations 
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The  superscript  "e"  used  in  the  above  equations  means  equivalent  quantity.  Compare  the 
structure  of  Eqs.  67  and  82.  Equations  83-88  clearly  show  that  the  elimination  process 
generates  equivalent  matrix  and  vector  replacement  quantities  only  for  the  body  which  holds  the 
eliminated  element  That  is,  properties  of  the  eliminated  body  are  projected  across  the  joint 
onto  its  parent  For  open  kinematic- loop  systems  and  as  a  consequence  of  optimal  block  U-L 
factorization,  each  stage  of  elimination  generates  a  further  reduced  system  of  equations  whose 
structure  is  identical  to  that  of  an  equivalent  system  with  the  corresponding  body  removed. 

Using  the  procedures  developed  in  this  paper,  one  may  generalize  Eqs.  66-88  to 
systems  composed  of  any  number  of  rigid  and  deformable  bodies  interconnected  by  joints. 
Space  limitations  do  not  allow  a  comprehensive  development  of  recursive  solution  algorithms 
for  arbitrary  systems  of  interconnected  rigid  and  flexible  bodies  and  this  paper  does  not  address 
the  steps  required  to  handle  closed  kinematic-loop  systems  [2,  4].  While  the  coefficient 
submatrix  dimensions  in  Eqs.  63, 77  and  78  depend  on  the  individual  body  elastic  degrees  of 
freedom,  the  matrix  dimensions  in  Eqs.  67,  79  and  82  are  the  same  whether  bodies  are 
deformable  or  not.  Only  the  submatrix  structures  and  thus  the  underlying  recursive  solution 
algorithms  differ. 
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8.  SUMMARY 


A  method  is  presented  for  effective  solution  of  equations  of  motion  for  systems  of 
interconnected  rigid  and  deformable  bodies.  The  equations  of  motion  of  each  body  in  the 
system  are  formulated  in  terms  of  the  absolute  coordinates  using  generalized  Newton-Euler 
equations.  These  equations  which  contain  the  nonlinear  inertia  coupling  between  the  rigid 
body  motion  and  the  small  elastic  deformation  are  expressed  in  terms  of  a  set  of  invariants 
which  depend  on  the  assumed  displacement  field.  Recursive  kinematic  relationships  in  which 
the  absolute  variables  of  body  i  are  expressed  in  terms  of  those  of  body  i- 1  and  joint  variables 
are  also  developed.  The  matrix  relating  absolute  and  relative  coordinates  is  used  to  define  joint 
forces  which  act  tangent  or  parallel  to  the  constraint  surfaces.  These  forces  are  the  generalized 
forces  associated  with  the  joint  generalized  coordinates.  By  combining  the  generalized 
Newton-Euler  equations,  the  kinematic  relationships  and  the  generalized  joint  force  equations, 
a  large  system  of  loosely  coupled  equations  is  obtained.  Matrix  partitioning,  optimal  block 
factorization  and  recursive  projection  methods  can  then  be  employed  to  obtain  an  order  n 
solution  for  the  constrained  system  equations  of  motion.  The  formulation  presented  in  this 
paper  can  be  applied  to  arbitrary  systems  with  rigid  and  flexible  elements,  and  numerous 
kinematic  joint  types. 
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Abstract 

Object-oriented  symbolic  computation  methods  are  developed  in  this  paper  for 
describing  and  analyzing  multibody  systems,  particularly  vehicles.  Computer  data 
objects  are  defined  for  symbolically  representing  (1)  vector/dyadic  algebraic  expressions, 
(2)  physical  components  in  a  multibody  system,  and  (3)  program  structures  needed  in  a 
simulation  code.  With  more  powerful  symbolic  manipulation  capabilities,  all  techniques 
normally  employed  by  human  analysts  and  programmers  can  be  mimicked  to  obtain 
efficient  numerical  simulation  codes.  These  include:  selecting  “natural”  coordinates, 
dropping  negligible  terms,  and  introducing  intermediate  variables  to  avoid  redundant 
computations.  Also,  the  description  of  unusual  forces  and  moments  is  straightforward 
when  the  analysis  software  can  deal  with  general  vector  notation.  The  methods  are 
demonstrated  for  an  example  three-dimensional  vehicle  handling  model. 


Introduction 

The  job  of  simulating  a  multibody  mechanical  system  breaks  down  into  two  tasks:  (1) 
formulate  equations  of  motion  and  (2)  solve  them  numerically.  The  automated  numerical 
solution  of  differential  equations  is  a  well  developed  area  in  engineering,  and  a  great  deal 
of  software  is  available  for  performing  this  work.  It  is  accomplished  by  a  simulation 
code — a  computer  program  written  to  numerically  simulate  a  multibody  system  by 
integrating  nonlinear  ordinary  differential  equations  over  a  small  time  step  hundreds  or 
thousands  of  times  in  a  “run.” 

The  efficiency  of  the  simulation  code  is  mainly  determined  by  the  number  of 
arithmetic  operations  employed  to  compute  derivatives  of  state  variables  at  each  time 
step — the  equations  of  motion. 

Approaches  that  are  taken  to  simulate  a  system  can  be  organized  into  three  categories: 

1.  Equations  of  motion  of  the  multibody  system  are  derived  by  an  analyst  and 
translated  by  a  programmer  into  a  specialized  simulation  code  that  pertains  to  one 
particular  multibody  system. 
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2.  A  generalized  simulation  code  is  used  in  which  the  equations  have  been 
formulated  and  programmed  once  and  for  all  in  a  generalized  fashion. 

3.  Symbolic  analysis  software  is  used  to  aid  the  analyst  and  programmer  in  the 
formulation  of  equations  and  the  development  of  a  specialized  simulation  code. 

The  manual  derivation  of  the  equations  of  motion  for  even  a  modestly  complex 
system  (say,  five  or  more  degrees  of  freedom)  is  a  tedious  undertaking  that  involves 
considerable  algebra,  a  nagging  uncertainty  of  the  correctness  of  the  equations,  and  a 
considerable  programming  and  debugging  effort  To  avoid  these  problems,  numerous 
multibody  computer  programs  exist  which  build  the  equations  of  motion  for  a  particular 
system  automatically,  freeing  the  engineer  to  concentrate  on  modelling  considerations 
and  parameter  values.  As  indicated  by  the  above  categories,  some  multibody  analysis 
programs  operate  numerically  while  other  operate  symbolically. 

Generalized  Simulation  Codes 

The  (numerical)  generalized  simulation  codes  begin  by  building  a  set  of  equations 
based  on  a  multibody  formulation  that  has  been  derived  for  once  and  for  all,  and  then 
they  proceed  to  numerically  integrate  the  equations  to  simulate  the  system  [1,  2].  These 
generalized  codes  are  appealing  to  many  engineers  because  they  offer  a  “complete 
solution”  that  handles  the  entire  simulation  effort,  from  model  formulation  to  the 
numerical  integration  of  equations.  Of  course,  there  are  some  trade-offs  made  to  achieve 
the  generality. 

One  trade-off  is  that  the  generalized  codes  run  slowly  relative  to  specialized 
simulation  codes.  A  human  dynamicist  usually  tries  to  obtain  equations  of  motion  that 
are  as  simple  as  possible,  using  a  number  of  techniques  that  will  be  detailed  later. 
Further,  good  programmers  can  improve  computational  efficiency  when  the  equations  are 
incorporated  into  the  simulation  code.  Because  the  general-purpose  simulation  code  was 
written  for  once  and  for  all  for  all  multibody  systems,  most  of  the  simplification 
techniques  cannot  be  used.  For  vehicle  simulations,  the  eventual  difference  in  simulation 
speed  between  a  special-purpose  code  and  a  generalized  code  can  be  more  than  an  order 
of  magnitude  (preliminary  work  shows  a  factor  ranging  from  10  to  over  100).  The 
inefficiency  of  the  general-purpose  software  precludes  its  use  for  highly  repetitive  design 
studies  and  real-time,  hardware-in-the-loop  operations. 

Another  trade-off  is  that  the  generalized  codes  are  not  completely  generalized  when  it 
comes  to  introducing  force-  and  moment-producing  components.  This  can  be  a  problem 
with  multibody  systems  that  include  elements  characterized  by  semi-empirical  models 
that  are  not  likely  to  have  been  fully  anticipated  by  the  programmer.  E.g.,  ground 
vehicles  include  tires,  nonlinear  springs,  complex  shock  absorbers,  etc.  that  are  modelled 
differently  based  on  the  intended  use  of  the  simulation.  Assuming  that  an  engineer  is 
able  to  develop  a  computer  representation  of  such  an  element  as  an  external  subroutine, 
the  subroutine  must  be  incorporated  into  the  multibody  simulation.  If  the  simulation 
program  is  written  by  hand,  it  is  a  simple  matter  to  incorporate  external  subroutines. 
However,  for  a  generalized  simulation  codes,  external  subroutines  are  limited  to  cases 
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that  were  anticipated  by  the  original  programmer.  Variables  needed  as  inputs  to  the 
external  subroutine  (positions,  angles,  speeds,  etc.)  are  not  always  readily  available. 

Symbolic  Analysis  by  Computer 

Symbolic  computation  offers  the  potential  to  combine  the  high  reliability  of  a 
general-purpose  code  with  the  efficiency  and  modeling  flexibility  associated  with  the 
development  of  a  new  special-purpose  code.  In  this  approach,  a  simulation  code  is 
generated  by  the  computer  that  is  similar  in  structure  and  efficiency  to  one  written  by  a 
human  programmer. 

There  are  two  approaches  that  have  been  taken  for  performing  the  symbolic 
computation  needed  for  analyzing  multibody  systems: 

1.  A  generic  symbolic  manipulation  language  is  used  by  a  dynamicist  who  performs 
the  analysis  in  the  same  manner  as  would  be  done  “by  hand,”  except  that  the 
computer  aids  in  performing  the  algebra. 

2.  A  complete,  self-contained  multibody  analysis  program  is  used  to  formulate 
equations  automatically,  based  on  a  description  of  how  bodies  in  the  multibody 
system  are  connected  to  each  other. 

Generic  symbolic  mathematics  software  (e.g.,  MACSYMA,  REDUCE,  Mathematica) 
have  been  employed  to  develop  equations  of  motion  for  multibody  systems  [3, 4].  These 
languages  include  capabilities  far  beyond  the  basic  “high-school  algebra”  needed  for 
analyzing  multibody  systems,  and  powerful  computers  are  required  for  acceptable 
performance.  However,  these  languages  do  not  include  provisions  for  optimizing 
numerical  analysis  computer  code. 

With  a  sufficiently  detailed  multibody  formalism,  equations  of  motion  can  be 
developed  automatically  using  only  rudimentary  computer  algebra.  Self-contained 
symbolic  multibody  codes  have  been  written  to  formulate  equations  that  can  be  merged 
into  a  simulation  program  (e.g.,  NEWEUL,  SD/FAST)  [5,  6,  7,  8].  However,  if  the 
symbolic  manipulation  is  too  limited,  some  important  simplification  methods  cannot  be 
applied.  Simplification  techniques  that  are  not  included  in  the  computer  algebra  can  still 
be  applied  by  including  them  in  the  multibody  formalism,  but  there  is  a  loss  of  modeling 
flexibility  because  the  formalism  must  include  specific  “plans”  for  dealing  with  all  types 
of  systems  being  modeled. 

This  paper  describes  a  new  approach  to  automating  the  symbolic  analysis  of 
multibody  systems.  A  symbolic  mathematics  language  is  designed  specifically  for 
analyzing  multibody  systems  and  generating  numerical  simulation  codes.  The  language 
directly  represents  three  aspects  of  the  overall  system  in  symbolic  form: 

1 .  vector  and  dyadic  algebra  expressions, 

2.  components  of  the  multibody  system  (bodies,  forces,  etc.),  and 

3.  pieces  of  computer  code  that  goes  into  the  numerical  simulation  code  being 
generated. 
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Techniques  are  presented  for  representing  and  manipulating  these  components  as 
computer  data  objects.  A  software  package  called  AUTOSEM  has  been  developed  in  Lisp 
at  The  University  of  Michigan  to  apply  these  techniques  and  automatically  generate 
simulation  codes.  AUTOS IM  is  used  to  illustrate  some  of  the  techniques  for  an  example 
three-dimensional  vehicle  handling  model. 

To  provide  a  background  for  the  symbolic  analysis  methods,  considerations  of 
numerical  efficiency  are  presented.  More  background  is  provided  by  a  summary  of  the 
multibody  dynamics  formalism  that  is  automated  to  derive  equations  of  motion. 

Numerical  Efficiency 

A  simulation  code  is  a  computer  program  that  simulates  a  physical  system  by 
numerically  integrating  differential  equations  of  motion  for  the  system  of  interest.  The 
integration  is  performed  by  using  a  numerical  approximation  to  integrate  the  equations 
over  a  very  small  increment  of  time,  which  is  “stepped”  from  a  start  time  to  a  stop  time  in 
a  simulation  run.  Numerical  efficiency  is  quantified  by  the  number  of  arithmetic 
operations  needed  to  compute  derivatives  of  the  state  variables  of  the  multibody  system  at 
each  time  step.  This  efficiency  derives  from  (1)  the  formulation  of  the  differential 
equations,  and  (2)  the  programming  style  of  the  simulation  code. 

Formulation  Options 

Choices  made  by  the  analyst  deriving  equations  of  motion  have  a  direct  impact  on  the 
complexity  of  the  resulting  equations.  Some  of  the  techniques  that  are  typically 
employed  to  simplify  equations  are  the  following: 

1.  State  variables  are  introduced  that  are  “natural”  to  the  system  being  analyzed 
(joint  displacements,  speeds  oriented  in  body-based  directions,  Euler  angles,  etc.), 
avoiding  transformations  to  a  predefined  choice  (e.g.,  Cartesian  global 
coordinates). 

2.  Terms  which  are  known  to  be  zero  for  the  specific  system  (but  which  could  be 
non- zero  for  a  more  general  formulation)  are  omitted  from  the  equations. 

3.  Forces  and  moments  that  cancel  due  to  symmetry  or  because  they  involve  no 
work  are  eliminated  when  possible.1 

4.  Equations  are  written  in  “factored  form,”  involving  products  and  ratios  of  sums  of 
terms.  For  example,  the  expression  (A  +  B  +  C)2  requires  two  additions  and  one 


1  It  should  be  noted  that  this  technique  is  not  always  effective  at  simplifying  equations.  By  eliminating 
non-working  forces  and  moments,  the  number  of  equations  is  reduced  but  the  complexity  of  the  equations 
is  increased.  The  question  of  whether  large  sets  of  simple  equations  are  better  or  worse  than  small  sets  of 
complicated  equations  has  not  been  resolved,  and  is  a  topic  of  current  research.  However,  multibody 
formalisms  that  that  include  the  constraint  forces  and  moments  are  much  more  complicated  than  the  one 
presented  in  the  next  section,  and  have  not  been  yet  shown  to  be  effective  when  implemented  symbolically. 
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integer  power,  the  expanded  form  (A2  +  2AB  +  B2  +  2 AC  +  2BC  +  C2)  requires 
five  additions,  six  multiplications,  and  three  integer  powers. 

5.  Terms  involving  products  or  powers  of  quantities  known  to  be  “small”  are 
dropped  if  they  are  of  order  2  or  higher.  In  many  mechanical  systems,  some  of 
the  motions  are  limited  such  that  variables  associated  with  those  motions  are 
much  smaller  than  other  expressions  arising  in  the  equations  of  motion. 

6.  Trigonometric  functions  of  small  quantities  are  replaced  with  truncated  Taylor 
series  expansions. 

Technique  no.  2  (removing  zero  terms)  can  only  be  partially  implemented  (via  the  use 
of  sparse  matrix  operations)  for  generalized  numerical  multibody  simulation  methods. 
However,  virtually  all  symbolic  multibody  programs  employ  it  Techniques  1  -  4  have 
been  used  by  some  programs,  and  techniques  5  and  6  have  not  been  used  in  a  generalized 
sense  until  the  implementation  described  in  this  paper.  (In  past  work,  “small”  variables, 
when  used,  are  built  into  the  multibody  formalism.  The  analyst  cannot  utilize  knowledge 
that  some  variables  and  parameters  are  small  and  that  others  are  not) 

Programming  Options 

A  given  set  of  equations  can  be  programmed  into  a  simulation  code  so  as  to  minimize 
computation.  Techniques  routinely  employed  by  human  programmers  are  the  following: 

7.  Complicated  expressions  that  occur  in  several  places  are  replaced  with 
intermediate  variables.  This  technique  is  particularly  important  for  multibody 
systems  because  the  equations  of  motion  are  inherently  redundant  Some  of  the 
redundancy  is  eliminated  by  using  a  recursive  dynamics  analysis  method.  Even 
so,  inspection  of  the  the  equations  of  motion  usually  reveals  that  some 
subexpressions  appear  more  than  once.  A  human  programmer,  concerned  with 
numerical  efficiency,  will  try  to  avoid  performing  the  same  computation  more 
than  once  by  saving  the  results  the  first  time  and  then  using  the  result  when  the 
same  computation  is  called  for  again. 

8.  Constant  expressions  are  “precomputed”  to  avoid  performing  identical 
computations  over  and  over  with  each  time  step.  In  previously  developed 
symbolic  analysis  methodologies,  simpler  equations  are  obtained  by  specifying 
numerical  values,  rather  than  symbols,  for  parameters.  During  the  manipulation 
of  the  symbolic  expressions,  the  numbers  are  combined  and  the  complexity  of  the 
equations  is  reduced  [6,  8].  However,  this  approach  results  in  a  simulation  code 
that  is  “hard-wired”  for  one  set  of  parameter  values,  and  which  cannot  be  used  for 
parameter  sensitivity  studies. 

A  more  general  approach  is  to  identify  expressions  involving  constants  and 
introduce  intermediate  constants.  In  a  simulation  code,  these  constants  can  be 
precomputed  as  part  of  the  program  initialization. 

9.  A  human  programmer  will  (hopefully)  not  introduce  code  that  serves  no  purpose. 
This  obvious  technique  can  be  difficult  to  implement  in  an  automated  analysis 
method.  For  example,  details  of  the  dynamics  analysis  are  often  recursive. 
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Consequently,  some  expressions  are  developed  so  that  they  can  be  referenced  in  a 
later  stage  of  the  recursion.  However,  if  the  recursion  stops,  they  may  not  be 
needed.  As  another  example,  an  expression  might  be  developed  which  is  later 
multiplied  by  zero.  Determining  if  a  particular  expression  will  be  needed  later  can 
be  very  difficult  at  the  time  the  expression  is  formulated,  although  it  is  trivial  to 
do  after  all  equations  are  formulated. 

10.  Large  matrices  are  partitioned  into  smaller  matrices,  based  on  the  topology  of  the 
system,  before  general  numeric  matrix  solution  methods  are  invoked. 


Equations  of  Motion  for  a  Multibody  System 

To  provide  an  idea  of  the  sorts  of  mathematical  operations  that  must  be  included  in 
the  computer  algebra,  a  dynamics  formalism  is  summarized.1 

The  multibody  formalism  presently  used  in  AUTOSIM  is  based  on  the  analysis 
method  of  Kane  and  Levinson  [9].  For  a  holonomic  system,  or  a  nonholonomic  system  in 
which  some  speeds  are  constant,  the  following  four  steps  are  performed: 

1.  Position  analysis.  For  each  body  in  the  system,  except  the  inertial  reference, 
develop  a  direction  cosine  matrix  relating  the  body  to  its  parent  Also,  introduce  a 
generalized,  coordinate  for  each  degree  of  freedom  of  the  joint  connecting  the 
body  to  its  parent  For  the  entire  system  there  are  n  generalized  coordinates,  qi 

(i-1,  «)• 

2.  Velocity  analysis.  For  each  body,  introduce  a  generalized  speed  to  account  for 
each  degree  of  freedom.  For  a  nonholonomic  system,  there  are  p  generalized 
speeds,  ui,  (i=l,p),  where  p<,n,  and  m  constant  speeds,  ui  (m  =  n  -p,  i =p,  n). 

For  each  body  B,  derive  an  expression  for  the  derivatives  of  the  generalized 
coordinates  in  terms  of  the  generalized  speeds.  Altogether,  there  are  n  such 
lanematical  equations. 

Then,  for  each  body  B,  formulate  expressions  for  the  following  quantities: 


a.  n  partial  velocities  for  the  mass  center,  B*,  defined  as 


av8’ 

3uj 


.  (i=l,  n) 


(1) 


b.  n  partial  angular  velocities,  defined  as 


acoB 

— — ,  (i=l,  n) 
duj 


(2) 


1  This  summary  does  not  cover  all  of  the  details  of  how  expressions  are  introduced  for  a  specific 
system.  Rules  are  applied,  based  on  the  topology  of  the  system.  A  summary  of  the  rules  is  beyond  the 
scope  of  this  paper. 


552 


(3) 


c.  central  acceleration  remainder,  defined  as 
-B*  V  „  dvf  * 


d.  angular  acceleration  remainder,  defined  as 

«5m  -  t  u,f  (4) 

3.  Implicit  Equations.  The  implicit  equations  of  motion  are  written  in  matrix  form 
as: 

Mli-f  (5) 

where  M  is  the  p  x  p  mass  matrix,  4  is  a  column  array  of  the  p  derivatives  of  the 
generalized  speeds,  and  f  is  a  column  array  called  the  generalized  force  array. 

a.  The  elements  of  the  mass  matrix  are  defined  as 

Nsodte 

mij  “2  (<?  • I8  •  ($  +  mB  vf*  •  vfi  (6) 


where  mB  is  the  mass  of  body  B  andl  is  the  inertia  dyadic  of  B. 
b.  The  elements  of  the  generalized  force  array  are  defined  as 


IV  -b  jb  -b  tb  -b  -b 
I  2*  Tt  —  Orem  *  I  -to  xl  *0)  J  •  Q>i 
Vt-l  / 


N*t 

where  £  ff  designates  the  sum  of  all  torques  applied  to  body  B  about  its 

t-i 

.  Nb,f 

center  of  mass  by  force-  and  moment-producing  components  and  £  Ff 

f=i 

designates  the  sum  of  all  forces  acting  on  the  body.  Forces  and  moments 
arising  from  the  kinematical  constraints  need  not  be  included,  because 
they  chop  out  when  the  dot-product  is  taken  with  the  partial  velocities. 

4.  Explicit  equations.  The  p  implicit  equations  in  eq.  5  are  solved  to  obtain  values  of 
the  accelerations  in  4. 


The  above  analysis  method  immediately  applies  several  of  the  simplification  methods 
described  earlier.  First,  it  permits  the  introduction  of  “natural”  state  variables,  including 
generalized  speeds  that  are  not  derivatives  of  the  generalized  coordinates  (technique  no. 
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1).  If  there  is  reason  to  think  that  a  certain  set  of  variables  is  in  fact  optimal,  the  analyst  is 
free  ft)  use  that  set  (In  contrast  some  other  multibody  formalisms  are  built  upon  a  pre¬ 
defined  set  of  state  variables.) 

A  second  potential  simplification  occurs  because  non-working  forces  and  moments 
are  never  introduced  (technique  no.  3). 

To  some  extent  the  efficient  performance  of  matrix  operations  (technique  no.  10)  is 
also  supported.  When  intermediate  variables  are  introduced  appropriately,  the  symbolic 
solution  of  the  acceleration  equations  in  step  4  results  in  an  efficiency  at  least  as  good  as 
can  be  obtained  from  a  carefully  partitioned  formulation.  However,  it  should  be  noted 
that  a  potential  drawback  of  this  approach  is  that  the  structure  of  the  system  is  “lost”  in 
the  building  of  a  mass  matrix  which  is  later  decomposed.  Recently,  a  number  of 
recursive  “Order-n”  formulations  have  been  published  that  offer  greater  efficiency  for 
systems  with  a  “chain”  topology  when  the  length  of  the  chain  exceeds  a  certain  number, 
generally  around  n=6  [10,  11,  12].  For  models  of  ground  vehicles,  the  formulation 
presented  here  is  usually  better.  However,  for  systems  with  chain  topologies,  a  recursive 
order-n  formulation  should  be  considered. 

Representing  Symbolic  Data 

The  methods  required  to  manipulate  symbolic  expressions  are  derived  from  the 
design  of  the  computer  data  types  that  are  used  to  represent  algebraic  expressions  and 
other  entities.  The  AUTOSIM  implementation  was  written  in  the  language  Common 
Lisp  [13],  called  simply  “Lisp”  in  the  remainder  of  the  paper. 

Overview  of  Data  Objects 

New  data  types  are  implemented  in  Lisp  as  structures,  with  slots  assigned  to  various 
entities  associated  with  the  data.  In  AUTOSIM,  structures  are  used  as  objects  to  support 
object-oriented  programming.1  Objects  facilitate  data  abstraction  by  allowing  programs 
to  manipulate  them  without  knowledge  of  the  details  of  their  internal  representation. 
Further,  generic  operators  work  by  obtaining  procedures  for  manipulating  objects  based 
on  the  types  of  the  operands.  For  example,  the  generic  function  add  works  for  two 
arguments  by  looking  at  the  types  of  the  two  arguments,  and  looking  up  that  pair  in  a 
dispatch  table  of  installed  specialized  functions.  The  specialized  function  from  the  table 
is  then  invoked.  The  specialized  function  can  be  very  specific  in  terms  of  the  types  of 
objects  it  understands,  since  it  need  not  understand  when  it  should  be  invoked  or  what  to 
do  with  other  types  of  data.  To  modify  the  way  a  generic  function  operates  on  a 
particular  type  of  object,  one  or  more  new  specialized  functions  are  “installed”  in  the 
system.  This  style  of  programming  allows  new  types  of  objects  and  new  operations  to  be 
incorporated  into  to  the  system  without  modifying  existing  software. 


1  Extensive  object-oriented  versions  of  Lisp  are  readily  available,  but  are  not  standardized.  To  ensure 
portability,  AUTOSIM  is  written  completely  in  standard  Common  Lisp.  The  object-oriented  extensions  are 
a  part  of  AUTOSIM. 
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Computer  Algebra 


■ array 


■ sequence 


,  simple-array 
vector 
.list 


r2< 


string 

simple-vector 


- symbol 

Figure  1.  Hierarchy  of  AUTOSIM  and  Lisp  data  objects. 

Lisp  includes  over  40  types  of  data  objects.  In  addition,  new  types  are  included  by 
the  use  of  structures.  Expressions  in  AUTOSIM  can  represent  scalars,  vector,  or  dyadics. 
They  are  composed  of  two  data  types:  numbers  and  expressions.  Figure  1  shows  a 
hierarchy  of  data  types  used  in  AUTOSIM,  as  they  relate  to  data  types  already  in  Lisp. 
Each  type  of  object  “inherits”  from  the  type  to  its  immediate  left  in  the  Figure.  For 
example,  an  object  of  type  cos  is  also  of  types  trig,  func,  and  expression. 
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Characteristics  of  the  types  trig,  func,  and  expression  are  “inherited”  by  objects  of 
type  cos,  and  functions  that  work  with  objects  of  type  trig,  func,  and  expression 
also  work  with  objects  of  type  cos. 

The  data  objects  in  the  figure  are  shown  in  four  groups,  related  to  (1)  computer 
algebra,  (2)  the  multibody  system,  (3)  the  numerical  simulation  program,  and  (4) 
additional  native  Lisp  objects.  All  native  Lisp  forms  are  shown  in  italics,  and  those  used 
extensively  in  AUTOSIM  are  shown  in  bold  italics.  The  multibody  analyses  and 
simplification  techniques  are  applied  by  manipulating  these  objects. 

Computer  Algebra 

Algebraic  expressions  are  built  from  number  and  expression  objects,  whose 
characteristics  are  listed  in  Table  1. 

Of  the  expressions  defined  above,  four  are  elementary  types  from  which  the  other 
compound  types  are  built  The  elementary  types  are  the  number,  the  sym,  the 
indexed-sym,  and  the  uv.  When  printed  as  Fortran  source  code,  the  sym  designates  a 
variable  and  an  indexed-sym  usually  designates  an  array  element  Unit-vectors  are 
never  written  in  the  final  Fortran  output,  but  can  be  entered  and  read  by  the  analyst. 
(They  are  printed  with  enclosing  square  brackets.) 

Recall  that  most  of  the  quantities  appearing  in  the  dynamics  equations  are  vectors  and 
dyadics.  Virtually  all  previously  developed  automated  multibody  analysis  methods 
define  directions  ahead  of  time,  so  that  vectors  can  be  described  in  terms  of  arrays  of 
scalar  quantities  with  predefined  directions.  This  approach  works  fine  for  the  rigid  body 
motions,  because  expressions  can  generally  be  formulated  in  terms  of  unit-vectors  fixed 
in  the  body  with  which  they  are  associated.  However,  active  forces  and  moments  can 
assume  arbitrary  orientations.  Introducing  arbitrary  forces  has  not  been  been  possible 
with  symbolic  analysis  programs  in  the  past  for  this  reason,  limiting  the  levels  of 
automation  that  are  possible  in  the  modeling.  This  limit  is  averted  by  including  unit- 
vectors  as  a  primitive  entity  in  the  computer  algebra  representation.  Vector  and  dyadic 
expressions  can  be  introduced  using  simple  mathematics  notation,  and  then  manipulated 
automatically.  Also,  vector  velocities  and  accelerations  can  be  projected  in  any  direction 
(via  the  dot-product  operation)  to  define  scalar  output  variables. 

Nested  expressions  (simplification  technique  no.  4)  are  supported  in  the  designs  of  the 
compound  types.  For  example,  the  expressions  in  the  list  of  factors  of  a  prod  can  be 
sums,  powers,  funcs,  etc.  There  are  no  limits  to  the  level  of  nesting  allowed  (other 
than  computer  memory). 

The  meta-type  expression  defines  a  repertoire  of  qualities  associated  with  all 
expression  types.  For  example,  the  units  of  any  expression  (if  known)  are  kept  in  the 
units  slot;  the  name  of  the  expression  (if  there  is  one)  is  kept  in  the  name  slot;  the 
derivative  with  respect  to  time,  if  known,  is  kept  in  the  slot  dxdt. 


556 


Table  1.  Summary  of  AUTOSIM  expression  t 


number 


expression 


dvad 


func 


asm 


indexed- 


Primary  Slots 


type,  small-order,, 
sort-code,  dxdt, 
sym-value,  const-or- 
var,  units,  name 


uvl,  uv2 


function,  args 


Definition 


number 


meta-type  for  all 
expression  objects 


dyad 


function  that  will  be 
written  into  numerical 


Examples 


TIRE(FZ,  SUP) 


ASIN(X) 


ATAN2(X,  Y) 


terms 


symbol,  default, 
hide , 


symbol,  body, 
dot-products, 
cross-products 


sum  of  expressions 


symbol  for  a  scalar 
parameter  or  variable 


indexed  symbol  for  a 
scalar  parameter  or 
variable 


unit-vector 


2.0*M*SIN(Q(1)) 


I  +  M*L**2 


M 


Expressions  are  classified  in  several  ways  besides  their  object  type.  The  type  slot  tells 
whether  an  expression  is  a  scalar,  vector,  or  dyadic.  Powers,  syms,  and 
indfe  xed-syms  always  have  their  type  slot  set  to  the  value  scalar.  Also,  all  numbers 
are  b'r  definition  scalar.  A  uv  has  its  slot  set  to  vector,  and  a  dyad  is  set  to 
dyadic.  The  prod  and  sum  objects  can  be  any  one  of  the  three  types,  depending  on  the 
types  >f  their  components. 

The  const-or-var  slot  tells  whether  an  expression  is  a  constant  or  a  variable.  It  is 
mainly  used  for  scalar  expressions,  to  identify  expressions  that  can  be  precomputed.  The 
value  of  this  slot  is  set  for  a  sym  or  an  indexed-sym  when  it  is  created.  When 
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compound  expressions  are  examined,  the  const-or-var  slot  issettoconst  if  all 
expressions  contained  in  the  compound  object  are  constants;  otherwise  it  is  set  to  var. 

Some  of  the  other  slots  are  described  later,  in  the  context  of  the  algebraic  operations 
used  on  expressions. 


Multibody  System 

A  multibody  system  is  composed  of  bodies  influenced  by  forces  and  moments  and 
connected  to  each  other  by  joints.  Points  are  fixed  locations  in  bodies  used  to  define  joint 
attachments,  force  attachments,  and  points  of  interest  needed  to  define  output  variables. 

body  —  A  data  structure  called  a  body  is  used  to  represent  each  body  in  the  system, 
together  with  the  kinematics  of  one  joint  and  a  complete  coordinate  system  fixed  in  the 
body.  Some  of  the  slots  in  a  body  are  shown  in  Table  2.  The  right-most  column 
indicates  whether  the  slot  is  mainly  associated  with  the  body,  the  joint,  or  the  coordinate 
system.  Massless  bodies  (with  the  mass  and  inertia  slots  set  to  zero)  can  be  used  to 
introduce  compound  joints  or  intermediate  reference  frames.  Also,  bodies  with  zero 
degrees  of  freedom  can  be  used  to  add  (or  subtract)  mass  or  inertia  to  an  existing  body. 

By  imposing  a  one-to-one  relationship  between  bodies  and  joints,  this  design  for 
describing  a  body  organizes  the  multibody  system  into  a  tree  topology.  In  general,  a  tree 
topology  consists  of  abstract  entities  called  nodes.  One  node  is  the  “root  node”  that  starts 
the  tree,  and  which  has  no  “parent  node.”  Every  other  node  in  the  tree  is  defined  as  a 
“child”  of  a  previously  defined  node.  An  example  tree  is  shown  in  Figure  2,  for  8  nodes 
labeled  by  capital  letters.  Parent-child  relations  are  shown  with  lines,  with  the  parent 
node  above  the  child  node(s).  The  root  node  is  N;  nodes  A  and  B  have  N  as  their 
“parent”  Thus,  A  and  B  are  the  “children”  of  N.  B  has  three  children.  Nodes  G,  C,  D, 
and  E  all  have  no  children,  and  are  called  “leaves”  of  the  tree. 


For  a  multibody  system,  the  nodes  are  rigid  bodies,  and  the  connecting  lines  are  joints 
between  the  bodies.  The  body  object  describes  the  tree  topology  simply  by  including 
slots  for  the  parent  and  children.  For  example,  if  the  tree  in  Figure  2  represents  a 
multibody  system,  the  body  labelled  B  would  identify  N  in  its  parent  slot,  and  the  list  (C 
D  E)  in  its  children  slot  The  body  N — the  root  node — would  contain  NIL  in  its  parent 
slot  and  the  list  (A  B)  in  its  children  slot 


Methods  used  previously  to  represent  multibody 
systems  have  involved  arrays  that  indicate  relationships 
between  bodies.  As  a  minimum,  a  body-connection 
matrix  is  needed  to  indicate  which  bodies  are  connected 
by  joints  [8,  14].  Other  matrices  are  needed  to  indicate 
parent-child  relationships  and  applications  of  constraint 
equations.  The  representation  presented  here  is  much 
simpler  and  permits  reconstruction  of  the  entire  tree  starting  from  any  body  in  the  tree, 
using  only  body  objects.  It  also  facilitates  analyses  that  require  that  the  bodies  be 
processed  in  a  certain  sequence.  For  example,  lisp  code  is  shown  below  to  apply  a 
function  func  to  each  body  in  an  order  such  that  the  parent  is  always  processed  before 
the  child. 


Figure  2.  Example  Tree. 
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Table  2.  Some  of  the  slots  in  a  body. 


Slot  Name 

Definition 

Describes^. 

symbol  for  user  to  reference  body 

body,  joint 

name 

descriptive  name  (string)  written  in 
output  files  of  simulation  code 

body,  joint 

uvs 

array  of  3  unit- vectors  that  define  1-2-3 
axis  directions  of  coordinate  system 

coordinate  system 

parent 

parent  body  (body) 

joint 

children 

joint 

cos-matrix 

3x3  direction  cosine  matrix  that  relates 
unit-vectors  of  this  body  to  the  unit-vectors 
of  the  parent  body  (array) 

coordinate  system 

mass 

expression  for  mass  of  body 

body 

inertia 

expression  for  the  inertia  dyadic  of  the 
body 

body 

0-poiru 

Origin  of  coordinate  system  (also,  joint 
attachment  point  in  this  body)  (point) 

coordinate  system, 
joint 

cm-point 

center-of-mass  location  (point) 

body 

wmrnmsm 

joint  attachment  in  parent  body  (point) 

joint 

new-rot-vars 

rotational  generalized  coordinates 
introduced  for  this  body  (list) 

joint 

new-rot-speeds 

rotational  generalized  speeds  introduced 
for  this  body  (list) 

joint 

new-trans-vars 

translational  generalized  coordinates 
introduced  for  this  body  (list) 

joint 

new-trans-speeds 

translational  generalized  speeds  introduced 
for  this  body  (list) 

joint 

abs-w 

absolute  rotational  velocity  of  this  body 

coordinate  system 

abs-vj 

absolute  velocity  of  the  joint-point 

coordinate  system 

worksheet 

another  structure  used  to  keep  various 
expressions  used  for  the  dynamics  analysis 
method  used.  For  Kane’s  method,  a 
structure  called  a  kane  is  used  which 
includes  partial  velocities,  acceleration 
remainders,  etc. 

dynamics  formalism 
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; ; ;  apply  function  func  to  each  body  from  the  root  down 

(defun  apply- func-to-tree-top-down  (func  body) 

(funcall  func  body) 

(dolist  (b  (body-children  body) ) 

(apply-func-to-tree-top-down  func  b) ) ) 

The  order  of  processing  occurs  from  parent  to  child  because  the  function  is  first  applied 
to  the  body,  and  then  the  apply-func-to-tree-top-down  function  is  recursively 
applied  to  the  children  of  the  body.  By  reversing  two  operations  in  the  above  function,  so 
that  the  recursion  occurs  before  the  body  is  processed,  the  children  are  always  processed 
first: 

;;;  apply  function  func  to  each  body  from  the  leaves  up 

(defun  apply-func-to-tree-bottom-up  (func  body) 

(dolist  (b  (body-children  body) ) 

(apply-func-to-tree-bottom-up  func  b) ) 

(funcall  func  body)) 

When  bodies  are  constrained  in  their  motions  due  to  joints,  the  vector  expressions 
developed  for  the  body  motions  can  be  defined  recursively,  based  on  the  motions  of 
another  body  and  the  relative  motion  between  bodies.  The  above  function  apply- 
func-to-tree-top-down  is  representative  of  the  functions  employed  in  AUTOSIM 
to  the  dynamics  formalism  shown  earlier. 

Point  —  Points  are  used  to  define  locations  of  interest  in  bodies,  such  as  origins  of  the 
coordinate  systems,  centers  of  mass,  attachment  points,  etc.  Each  body  contains  at  least 
three  points,  as  shown  before  in  in  Table  2.  Additional  points  can  be  defined  as 
needed  to  identify  attachment  points  for  forces  or  as  points  of  interest  for  output 
variables.  Table  3  shows  how  a  point  is  defined  in  the  system. 


Table  3.  Some  of  the  slots  in  a  point. 


Slot  Name 

Definition 

symbol 

Symbolic  name  (symbol)  for  user  to  identify  point 

name 

descriptive  name  (string)  of  point 

body 

body  that  contains  point 

coords 

array  of  3  coordinates  of  po  int  in  coordinate  system  of  body 

for  cam  —  Force-producing  elements  are  represented  by  objects  called  forces  and 
moment-producing  elements  are  represented  by  moments.  Both  types,  which  inherit 
from  the  meta-type  forcem,  are  summarized  in  Table  4. 


As  each  force  is  introduced  by  the  analyst,  it  is  put  into  a  list  of  all  forces  of  the 
multibody  system.  Similarly,  all  of  the  moments  are  kept  in  a  list.  The  summations  of 
forces  and  moments,  needed  for  eq.  7  in  the  dynamics  analysis,  are  obtained  for  each 
body  by  going  through  the  lists  of  forces  and  moments  and  checking  to  see  if  the  current 
body  is  one  of  the  two  bodies  contained  in  the  two  body  slots  of  the  forcem.  If  the 
body  being  analyzed  is  the  one  contained  in  the  bodyl  slot,  the  forcem  is  applied  with 
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Table  4.  Some  of  the  slots  in  a  forcem. 


Slot  Name 

Definition 

symbol 

Symbolic  name  for  user  to  identify  forcem 

name 

descriptive  name  of  forcem  used  when  writing  documentation 

dir 

expression  that  gives  direction  in  which  forcem  acts 

expression  that  gives  scalar  magnitude  of  forcem 

bodyl 

first  body  on  which  forcem  acts 

body2 

second  body  from  which  forcem  acts 

pointl 

point  on  line  of  action  of  force  on  body  1  (force  only) 

point2 

point  on  line  of  action  of  force  on  body  2  (force  only) 

positive  magnitude.  When  the  body  is  matched  with  the  body 2  slot,  the  forcem  is 
applied  with  negative  magnitude. 

The  point l  and  point2  slots  inaforceare  used  to  obtain  expressions  for  the  torque 
applied  to  a  body  if  the  force  acts  on  that  body  and  its  line  of  action  does  not  pass  through 
the  center  of  mass.  That  is,  torque  is  defined  as 

T  =  r  x  f  (8) 

where  r  is  the  position  vector  going  from  the  center  of  mass  to  the  point  on  the  body 
through  which  the  force  passes,  and  f  is  the  force  (the  product  of  the  expressions  in  the 
dir  and  exp  slots  of  the  force  object). 

Numerical  Simulation  Program 

In  addition  to  expressions  and  the  multibody  system,  the  numerical  simulation 
program  produced  as  output  by  AUTOSIM  is  represented  with  objects.  Three  that  are  the 
most  significant  are  the  types  eqs,  outvar,  and  declaration. 

eqs  —  a  sequence  of  assignment  statements  is  represented  by  an  object  called  an  eqs. 
Some  of  the  sequences  that  are  generated  and  manipulated  are  the  kinematical  equations, 
the  dynamical  equations,  the  trigonometric  functions  used  in  other  equations,  and  the 
output  variables. 

outvar  —  information  about  a  variable  that  will  be  produced  as  output  by  the 
simulation  code  is  represented  by  the  outvar  object.  It  includes  a  short  name,  a  long 
name,  a  generic  name,  an  expression,  and  units.  Before  the  simulation  code  is  written, 
the  list  of  outvars  is  processed  to  ensure  that  statements  are  generated  to  compute  all 
dependent  variables  defined  by  the  analyst.  The  labeling  information  is  written  by  the 
simulation  in  such  a  way  that  output  files  can  be  handled  automatically  by  post¬ 
processing  software  for  graphics  and  analysis. 
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declaration  —  a  list  of  all  variables  of  a  certain  type  (REAL,  INTEGER,  etc.)  that 
must  be  declared  in  a  specific  subroutine  module  of  the  simulation  code  is  represented  in 
a  declaration  object 

In  its  present  form,  all  output  source  code  is  written  in  the  Fortran  language. 
However,  the  representation  of  the  simulation  program  in  eqs.outvar,  and 
declaration  objects  is  not  dependent  on  the  language.  Generating  simulation  code  in 
a  different  language  (e.g.,  C)  is  mainly  a  matter  of  telling  these  objects  to  print 
themselves  differently,  according  to  the  syntax  of  the  target  language. 

Computer  Algebra  Operations 

The  mathematical  operations  needed  to  derive  equations  of  motion  for  a  multibody 
system  and  generate  source  code  for  a  numerical  simulation  program  can  be  deduced 
from  the  material  presented  so  far.  From  the  point  of  view  of  a  software  implementation, 
there  are  four  levels  of  mathematics  operations  used:  (1)  operations  are  implicitly 
performed  when  a  compound  expression  object  is  created  (e.g.,  a  power  object 
represents  an  expression  raised  to  a  power,  a  prod  object  represents  the  multiplication  of 
expressions,  etc.),  (2)  several  primitive  algebra  operations  are  defined  that  use 
information  obtained  from  the  expression  objects  to  create  a  new  expression  object,  (3) 
higher-level  algebra  operations  are  defined  in  terms  of  primitive  operations,  and  (4)  some 
operations  are  performed  on  computer  code  that  has  already  been  generated.  This  last 
category  of  operations  is  analogous  to  a  human  programmer  “looking  over”  the  code  he 
or  she  has  written,  to  possibly  make  improvements. 

Making  Expression  Objects 

Each  definition  of  a  compound  expression  object  implies  an  operation.  The  functions 
that  make  objects  check  their  arguments  and  create  simpler  objects  when  possible.  In 
fact,  significant  algebraic  simplifications  are  performed  in  these  operations.  Table  5 
summarizes  simplifications  that  are  performed  by  creator  functions. 

The  “small”. quantity  simplifications  all  occur  in  the  make- sum  operation.  The  term 
with  the  minimum  order  of  “smallness”  is  used  as  a  reference  and  all  other  terms  are 
compared  to  it  Terms  whose  order  is  more  than  the  reference  by  some  threshold  are 
dropped.  Normally,  the  threshold  for  dropping  small  terms  is  2.  However,  this  value  can 
be  modified  if  needed  to  perform  alternate  analyses  that  require  higher  order  terms.  For 
example,  AUTOSIM  has  been  used  to  generate  equations  needed  for  a  bifurcation 
stability  analysis  in  which  all  state  variables  are  “small”  and  terms  are  kept  up  to  the  fifth 
order  [15]. 

Pains  are  taken  to  ensure  that  equivalent  occurrences  of  a  compound  expression 
always  are  created  the  same  way.  Sums  nested  within  sums  and  prods  within  prods  are 
removed.  E.g.,  the  sum  (A  +  B)  +  C  yields  (A  +  B  +  C),  rather  than  ((A  +  B)  +  C). 
Terms  and  factors  are  sorted  in  the  make-prod  and  make-sum  functions.  E.g.,  the 
product  of  B  and  A*C  is  A*B*C  rather  than  B*  A*C.  A  sign  convention  for  sums  is  used 
that  results  in  a  repeatable  formulation  for  a  given  sum,  regardless  of  how  it  is  obtained. 
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Table  5.  Simplifications  performed  by  creator  functions. 


Creator  Function 

Simplifications 

make-asin 
make-cos 
make- sin 

•  if  argument  is  the  inverse  function,  return  argument  of 
argument 

(e.g.  sinfsin^x)  x) 

•  if  argument  is  a  number,  evaluate 

•  if  argument  is  small,  return  truncated  Taylor  expansion 

make- at an 

•  same  simplifications  as  for  make-asin 

•  if  there  are  two  arguments,  divide  both  by  GCF 
[e.g.,  tan*1(A*X,  A*Y)  ATAN2(X,Y)] 

make-power 

•  if  base  is  a  power,  change  exponent 

•  if  base  is  number,  evaluate 

•  if  base  includes  small  terms,  drop  if  possible 

make-prod 

•  if  the  coefficient  is  0,  return  0 

•  if  the  coefficient  is  1  and  there  is  one  factor,  return  the  factor 
—  if  any  numbers  are  included  as  factors,  multiply  them  and 
include  with  the  coefficient  and  remove  the  numbers  from  the  list 
of  factors 

••  if  any  factors  are  prods,  multiply  coefficients  and  combine 
lists  of  factors  (i.e.,  expand  nested  prods) 

**  if  any  factors  can  be  combined  into  a  power,  make  the 
substitution 

•  else,  sort  factors  and  create  prod  object 

make- sum 

••  compare  “small-order”  values  of  terms  and  remove  those 
which  are  negligible 

••  check  for  trig  identities:  sin2x  +  cos2x  — >  1;  1  -  sin2x  — » 
cos2x; 

1  -  cos2x  sin2x 

••  if  any  terms  are  sums,  remove  them  and  append  terms  from 
nested  sums  to  existing  list  (i.e.,  expand  nested  sums) 

**  if  sym-value  of  sum  would  be  negative,  negate  all  terms  and 
return  negative  sum  (prod  with  coefficient  of -1) 

•  else,  sort  terms  and  create  sum  object 

Note: 

simplifications  marked  with  ••  mean  that  after  the  simplification 
is  performed,  the  make-  operation  is  called  again  recursively 
using  updated  arguments. 

The  expression  (-A  -  B  -  C)  would  never  be  generated:  instead,  that  result  is  always 
represented  as  ~<A  +  B  +  C). 

Primitive  Algebra  Operations 

Table  6  summarizes  the  primitive  mathematical  operations.  These  operations  involve 
one  or  two  arguments.  In  the  object-oriented  environment,  each  operator  has  an 
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Table  6.  Summary  of  primitive  AUTOSIM  mathematics  operations. 


Operation 

■■LdlHlUi  - 11  &2JH1 

Description 

add 

expl,  exp2 

add  two  expressions 

const-or-var 

exp 

is  expression  constant  or  variable? 

cross 

vexpl,  vexp2 

dot  product  between  two  vectors 

dot 

vexpl,  vexp2 

dot  product  between  two  vectors 

dxdt 

exp 

derivative  of  expression  with  respect  to 
time,  in  the  inertial  reference  frame 

qcf 

expl,  exp2 

mul 

expl,  exp2 

multiply  two  expressions 

neq 

exp 

negate  expression 

partial 

exp,  symbol 

partial  derivative  of  expression  with 
respect  to  symbol,  in  the  inertial  reference 
frame 

associated  dispatch  table  which  is  used  to  find  a  function  for  dealing  with  a  specific  type 
of  expression  (for  unary  functions)  or  combination  of  types  (for  binary  operations).  For 
example,  to  add  a  sura  and  a  prod,  the  combination  ( sum  prod)  is  looked  up  in  the 
appropriate  table,  and  the  function  found  from  the  table  is  applied  to  the  two  arguments. 
Generally,  the  dispatch  functions  are  small,  simple,  and  specific  to  one  combination  of 
expression  types.  Hence,  they  are  easy  to  modify  and  debug.  Also,  new  types  of 
expression  objects  and  new  functions  are  “installed”  in  the  system  without  modifying  any 
of  the  existing  software. 

Most  of  the  operators  in  the  table  work  as  might  be  expected.  Exceptions  and  special 
notes  are  provided  below. 

mul  —  When  developing  expressions  through  multiplication,  products  are  not  expanded, 
in  order  to  keep  factored  forms.1  Further  simplifications  are  attempted — numbers  are 
combined,  multiple  appearances  of  an  expression  are  combined  into  a  power,  multiple 
powers  with  the  same  base  expression  are  combined,  etc. 

gcf  —  The  symbolic  “greatest  common  factor”  (GCF)  between  X  and  Y  is  determined. 
(If  X  and  Y  have  no  factors  in  common,  or  one  of  them  is  a  number,  then  the  GCF  is  1.) 

add  —  The  general  method  for  adding  two  expressions  X  and  Y  is  with  the  formula 

X  +  Y  =  GCF(X,  Y)  *  (X  /  (GCF(X,  Y)  +  Y  /  GCF(X,  Y)) 

After  the  GCF  is  factored  out,  the  results  are  combined  with  make-sura.  For  example, 
when  the  expressions  A*X  and  B*X**2  are  added,  the  result  is  X*(A  +  B*X). 


1  There  are  applications  in  which  expanded  forms  are  preferred.  For  example,  stability  analyses  can 
require  coefficients  of  state  variables  and  their  products  and  powers.  The  AUTOSIM  software  does  include 
an  option  to  expand  expressions,  although  this  option  is  not  used  when  the  objective  is  to  automatically 
generate  simulation  codes. 
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dot  —  The  dot  product  operation  is  valid  for  two  vectors,  a  vector  and  a  dyad,  or  two 
dyads.  The  method  used  for  applying  the  operation  is  to  recursively  expand  expressions 
into  multiplications  and  additions  of  subexpressions,  and  dot  products  of  uv/dy  ad  pairs. 
This  approach  eventually  expands  the  original  dot  product  to  an  expression  involving 
operations  defined  for  scalar  algebra,  together  with  dot  products  between  unit-vectors. 
Thus,  the  only  new  primitive  operation  needed  is  the  dot  product  between  two  uvs. 

Recall  that  the  uv  contains  a  slot  called  dot-products.  This  contains  a  table  with  all 
pairs  of  uvs  whose  dot  product  is  known.  Initially,  each  table  contains  three  entries  for 
the  three  uvs  in  the  body  in  which  the  uv  is  defined.  (The  values  are  1  for  the  dot 
product  of  the  uv  with  itself  and  0  for  the  other  two  uvs  of  the  triad.)  If  the  table 
contains  the  answer,  it  is  used.  Otherwise,  the  dot  product  is  between  two  uvs  associated 
with  different  bodies  that  have  not  yet  been  analyzed,  so  an  analysis  is  performed. 

Each  body  has  a  slot  with  a  direction  cosine  matrix  relating  the  uvs  for  that  body 
with  the  uvs  of  the  parent  The  uv  whose  body  is  furthest  “down”  the  topology  tree  is 
transformed  into  an  expression  involving  the  three  uvs  of  its  parent  body.  The  dot 
product  is  then  taken  between  the  new  expression  and  the  uv  that  was  “up”  the  tree. 

This  method  is  recursive — the  dot  operator  is  defined  in  terms  of  itself.  It  works, 
because  with  each  recursion,  the  expressions  being  considered  are  simpler,  and/or  the  uvs 
are  closer  in  the  tree.  Eventually,  the  process  is  guaranteed  to  stop  when  both  arguments 
are  uvs  associated  with  the  same  body. 

The  results  of  the  process  are  stored  in  the  table  of  dot-products  for  one  of  the  original 
uvs,  so  that  the  “tree-climbing”  and  transformations  (via  the  direction  cosine  matrices) 
are  not  required  the  next  time  the  dot  product  is  needed. 

The  method  of  “tree  climbing”  ensures  that  the  minimum  number  of  direction 
transformations  is  performed  for  each  dot  product  operation.  Thus,  trigonometric 
simplifications  are  not  required  for  this  operation. 

Note  that  the  dot-product  operator  makes  use  of  information  from  both  the  uv  object 
from  the  computer  algebra  part  of  the  system,  and  also  the  body  object  from  the 
multibody  part  of  the  system. 

cross  —  The  cross  product  operation  is  performed  using  the  same  recursive  approach  as 
described  above  for  the  dot  product  A  uv  crossed  with  a  uv  is  obtained  from  the  table  of 
values  in  the  cross-product  slot  of  either  uv  if  available  (with  a  multiplication  by  -1  if  the 
table  of  the  second  uv  is  used).  Otherwise,  the  cross-produc'  s  formulated  using  the 
expansion: 


a  x  b  — >  [(a  •  bi)  b\  +  (a  •  b2)  b2  +  (a  •  bj)  bj]  x  b  (9) 

where  a  is  the  first  uv,  b  is  the  second,  and  bt,  b2,  and  b3  are  the  unit- vectors  for  the 
body  containing  b.  As  was  the  case  for  the  dot  product,  some  of  the  information  needed 
to  perform  the  operation  is  obtained  from  the  body  object  from  the  body  slot  of  the  uv 
object 
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dxdt  —  The  derivative  of  an  arbitrary  expression  is  determined  using  elementary  rules 
of  calculus  to  recursively  expand  the  expression  into  products  and  sums  of  simpler 
expressions  and  their  derivatives.  The  expansion  stops  when  a  sym,  indexed- sym, 
number,  or  uv  is  reached.  The  time  derivative  of  a  sym  or  indexed-  sym  is  zero  if 
the  expression  is  a  constant,  otherwise  it  is  obtained  from  the  dxdt  slot 

The  time  derivative  of  a  uv  $7)  is  defined  as 

U  CD8,  X  u  (10) 

— M 
where  co  is  the  absolute  rotational  velocity  of  the  body  containing  u,  obtained  from  the 

abs-w  slot  of  the  body  found  from  the  body  slot  of  the  uv. 

There  are  other  ways  in  which  the  time  derivative  might  be  defined.  For  example, 
one  could  project  the  uv  into  the  coordinate  system  of  the  fixed  inertial  reference  and 
then  take  derivatives  of  the  scalar  components.  However,  eq.  10  has  two  strong 
advantages: 

1.  it  leads  to  simple  expressions,  matching  the  conventional  definition  of  the 
derivative  of  a  vector  fixed  in  a  rotating  reference  frame. 

2.  the  cross-product  operation  remains  valid  after  small  terms  have  been  dropped 
and  trigonometric  functions  have  been  replaced  with  truncated  Taylor  series. 
Thus,  simplifications  from  small  angles  and  small  speeds  can  be  made  as  soon  as 
the  small  quantities  appear  in  the  analysis  without  causing  errors  in  derivatives 
taken  later. 

After  the  absolute  tune  derivative  of  an  expression  is  derived,  the  result  is  put  into  the 
dxdt  slot  for  further  reference. 

partial  —  Partial  derivatives  are  obtained  using  the  same  basic  process  as  used  for 
dxdt,  except  that  results  are  not  saved  and  partial  derivatives  of  unit-vectors  are  zero. 


Table^dummarvofhigherlevelmathema^ 


Operation 

Argument(s) 

Description 

sub 

expl,  exp2 

negate  exp2  and  add  to  exp2 

inv 

exp 

make-power  with  exponent  of-1 

square 

exp 

multiply  expression  with  itself 

div 

expl,  exp2 

invert  exp2,  then  multiply  with  expl 

dot-plane 

vexpl,  vexp2 

project  vexpl  onto  plane  normal  to  vexp2 

mag 

vexp 

scalar  magnitude  of  vector,  v  — »  V  v  •  v 

dir 

vexp 

direction  of  vector,  i.e..  v/f  vl 

angle 

vexpl,  vexp2, 
(vexp3) 

angle  between  vexpl  and  vexp2,  with  sign 
determined  by  optional  vexp3 
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Higher  Level  Operations 

Table  7  lists  mathematics  operations  that  are  derived  from  the  above  primitive 
functions. 

Most  of  the  above  operators  have  standard  meanings  and  are  implemented  according 
to  their  definitions.  However,  two  are  introduced  to  aid  in  describing  quantities 
associated  with  vehicles  and  deserve  further  explanation. 

dot-plana  —  This  operator  describes  a  short  procedure  in  which  a  vector  is  projected 
onto  a  plane.  Consider  vectors  a  and  b.  The  new  vector  is  defined  as  a  •  (ci  cj  +  C2  C2) 
where 


ci  = 


a  x  b 
a  x  b 


_  Ci  x  b 
c2  * 


ci  x  b 


(11) 


angla  —  The  angle  between  two  vectors  v  1  and  V2  is  determined  by  defining  three  unit- 
vectors  and  projecting  one  onto  the  other  two  to  obtain  an  expression  for  the  arctangent  of 
the  angle.  The  steps  are  described  below  and  illustrated  in  Figure  3: 


1.  The  directions  of  the  two  vectors  are  obtained: 

“1  =1^7  U2=  p2, 

|Vl|  |v2| 

2.  A  third  direction  is  defined  that  is  orthogonal  to  v 

U3  =  (uj  x  u2)  x  Ui 

3.  The  angle,  0,  is  defined  as 

0  =  tan-1  (“3  *  |  sign(v3  •  [ui  x  U2]) 

\ui  •  U2/ 


(12) 


(13) 


(14) 


This  method  is  valid  for  angles  of 
any  size.  Results  are  expressed  using 
the  Fortran  ATAN2  function,  which 
accepts  two  arguments  and  is  valid  for 
the  range  of  -180°  £  0  £  +180°.  The 
make-at.an  function  is  used  to  create 
the  resulting  expression,  with  the 
possible  simplifications  noted  earlier  in 
Table  5.  Note  that  an  optional  third 
vector,  v3,  is  used  to  establish  the  sign 
of  the  an<»le.  (The  sign  function  in  eq. 
14  has  a  value  of  ±1,  with  a  sign  that 
matches  that  of  its  argument.) 


!T3  =  (ifi  x  Ifz)  x  Hi 
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Multibody  Operations 

A  few  operations  for  dealing  with  points  and  bodies  are  useful  for  specifying  forces, 
moments,  and  dependent  variables  of  interest  These  are  summarized  in  Table  8. 


Table&Sumnwn^fAlJTOSD^geration^Forbodi^^iidgoints. 


Operation 

Argument(s) 

Description 

rot 

body 

rotational  velocity  of  body 

pos 

point 

position  vector  from  origin  of  inertial 
reference  to  point 

POS 

point  1,  point2 

position  vector  from  point2  to  point  1 

vel 

point 

velocity  vector  from  origin  of  inertial 
reference  to  point 

vel 

point  1,  point2 

velocity  vector  going  from  point2  to  point  1 

The  effect  of  vel  can  be  obtained  using  the  pos  operator  together  with  dxdt. 
However,  the  result  usually  involves  derivatives  of  generalized  coordinates,  whereas  the 
vel  function  provides  the  result  as  an  expression  involving  generalized  speeds. 


Accelerations  are  obtained  by  combining  the  dxdt  function  with  rot  and/or  vel. 

Operations  on  Program  Code 

The  equation  simplifications  noted  earlier  (simplification  techniques  8,  9,  and  10)  are 
easy  to  implement  after  the  simulation  code  has  been  generated  and  can  be  inspected. 
This  means  that  equations  are  not  written  as  they  are  derived,  but  are  kept  in  computer 
memory  as  eqs  objects. 

Introduction  of  Intermediate  Variables  and  Constants 

The  simulation  code  generated  by  AUTOSIM  includes  two  sets  of  intermediate 
symbols  used  to  replace  expressions.  One  set  is  for  constant  expressions  and  the  other  is 
for  variables.  (Both  are  called  intermediate  variables  below,  since  that  is  how  they  are 
implemented  in  a  Fortran  program.)  A  function  called  intro-var-if-new  is  used  to 
process  expressions  and  introduce  new  variables  as  needed.  The  method  is  for  doing  this 
involves  a  table  that  is  maintained  by  the  system  of  all  expressions  that  have  been 
replaced  by  intermediate  variables.  The  replacements  are  indexed- sym  objects,  which 
prints  as  elements  of  a  Fortran  array  PC  (for  precomputed  constants)  or  Z  (for  variables). 
A  simplified  version  of  the  algorithm  in  intro-var-if-new  is  as  follows: 

•  If  the  expression  is  an  indexed-sym,  a  sym,  or  a  number,  it  is  returned. 

•  Else,  if  the  expression  is  in  the  table  of  existing  intermediate  variables,  the 
corresponding  indexed-sym  is  returned. 

•  Else,  if  the  expression  is  a  constant,  define  a  new  indexed-sym,  put  it  at  the  end 
of  the  list  in  the  eqs  object  for  intermediate  constants,  put  the  expression  and 
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symbol  into  the  table  of  intermediate  variables,  and  return  the  new  indexed- 
sym. 

•  Else,  if  any  constant  expressions  can  be  factored  out,  do  so.  Apply  intro-var- 
if-new  to  the  constant  part  and  the  variable  part,  then  apply  intro-var-if- 
new  to  the  product 

•  Else,  apply  int  r o- var-  i  f -new  to  all  components  of  the  compound  expression 
(arguments  in  a  func,  factors  in  a  prod,  etc.),  then  continue. 

-  If  the  expression  is  a  prod,  process  the  scalar  factors  two  at  a  time.  If  the 
prod  included  a  factor  that  is  a  uv  or  dyad,  skip  over  it  (intermediate 
variables  are  only  used  to  represent  scalar  expressions).  Multiply  the  first 
two  scalar  factors  and  apply  intro-var-  if-new  to  the  result  Multiply 
the  result  with  the  next  scalar  factor  and  apply  intro-var-if-new  to 
that  result  Proceed  until  all  scalar  factors  have  been  processed.  The 
definitions  of  the  new  indexed- syms  are  variables,  and  are  placed  at  the 
end  of  an  eqs  object  used  for  the  intermediate  variables. 

-  Else,  introduce  a  new  indexed- sym,  put  its  definition  at  the  end  of  the 
appropriate  eqs  object  update  the  table,  and  return  the  new  indexed- 
sym. 

This  algorithm  is  recursive,  and  results  in  a  number  of  intermediate  expressions  being 
introduced  for  a  single  compound  expression.  For  example,  consider  the  expression 
A*(B*X  +  C*Y),  where  A,  B,  and  C  are  constants  and  X  and  Y  are  variables.  Processing 
this  expression  with  the  intro-var-if-new  function  leads  to  the  following  eqs 
object  for  intermediate  constants, 

PC(1)  *  A*B 
PC (2)  =  A*C 

and  the  following  object  for  intermediate  variables: 

Z (1)  =  PC (1) *x 

Z  (2)  =  PC  (2)  * Y 
Z(3)  =  Z  ( 1 )  +  Z  (2) 

Note  that  the  number  of  multiplications  needed  to  compute  the  full  expression  has 
been  increased  from  3  in  the  original,  to  4  with  the  intermediate  variables.  However,  two 
of  the  new  multiplications  involve  constants,  leaving  only  two  multiplications  that  must 
be  performed  at  each  time  step  during  a  numerical  simulation  run. 

For  the  above  algorithm  to  be  effective,  it  is  essential  that  expressions  are  uniquely 
identified  in  the  table.  For  example,  if  the  product  A*(l  +  COS(Q(l)))  occurs  in  one 
place,  we  don’t  want  an  equivalent  expressions  such  as  (-COS(Q(l))  -1)*A  to  occur  in 
another,  because  the  search  of  the  look-up  table  will  not  find  the  second  occurrence.  This 
is  why  the  make-prod  and  make-sum  functions  described  earlier  ensure  that  a  given 
product  or  sum  always  has  the  same  structure. 
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The  above  algorithm  always  introduces  a  new  intermediate  variable  whenever  an 
arithmetic  operation  or  function  evaluation  occurs.  For  simple  multibody  systems,  this 
can  sometimes  degrade  computational  efficiency,  by  eliminating  possible  simplifications 
that  occur  by  factoring.  For  example,  consider  an  expression  A*U(1)  which  is  later 
added  to  A*U(2).  If  both  expressions  are  replaced  by  intermediate  variables,  say  Z(5) 
and  Z(15),  the  sum  is  (Z(5)  +  Z(15).  It  requires  2  multiplications,  which  occur  when  Z(5) 
and  Z(15)  are  computed.  If  the  intermediate  variables  were  not  introduced,  the  result  of 
the  addition  would  be  A*(U(1)  +  U(2)) — an  expression  with  only  one  multiplication. 

There  are  some  reasons  not  to  introduce  a  new  intermediate  variable  if  that  variable 
will  only  be  used  once.  First,  the  equations  become  almost  unreadable  by  humans.  The 
equations  are  usually  complicated  to  begin  with,  and  introducing  intermediate  variables 
that  only  appear  once  compounds  the  difficulty.  Second,  some  Fortran  compilers 
optimize  machine  instructions  for  large  expressions,  putting  temporary  intermediate 
results  directly  into  working  registers.  For  machines  with  vector  processing  or  other 
parallel  computing  capabilities,  other  techniques  are  available  for  the  compiler.  If  an 
intermediate  variable  is  defined  in  the  source  code,  the  compiler  is  obliged  to  save  its 
value  by  moving  it  into  a  RAM  location.  For  these  reasons,  the  method  described  below 
for  removing  unused  code  is  extended  to  also  eliminate  any  intermediate  variables  that 
would  only  be  used  once. 

Removal  of  Unused  Code 

Before  the  equations  are  written  as  output  into  a  Fortran  program,  they  are  inspected 
for  intermediate  variables  that  are  never  used,  or  used  only  once.  Only  equations  that 
contribute  to  the  computation  of  the  accelerations  or  to  the  computation  of  output 
variables  are  actually  written  into  the  simulation  code  that  is  generated  by  AUTOSIM. 

An  important  part  of  the  design  of  AUTOSIM  is  that  the  three  symbolic  elements — 
the  sym,  the  indexed-sym,  and  the  uv — are  stored  in  memory  such  that  there  are  no 
copies,  for  example,  the  object  called  “Q(2)”  exists  in  only  one  place,  even  though  it 
appears  in  more  than  one  expression.1  Recall  that  one  of  the  slots  in  the  sym  object  is 
called  hide.  The  hide  slot  is  used  for  removing  unused  code  by  keeping  count  of  how 
many  times  the  sym  actually  appears.  The  eqs  object  only  prints  equations  involving 
syms  whose  hide  slots  are  not  set  to  0.  For  example,  if  an  eqs  contains  100  equations,- 
but  only  10  involve  syms  with  hide  counts  greater  than  0,  then  only  10  equations  are 
printed.  The  other  90  equations  are  still  in  memory,  but  are  hidden. 

To  count  occurrences,  the  hide  slots  in  all  intermediate  variables  in  an  eqs  are  set  to 
0,  and  then  equations  used  to  compute  derivatives  and  output  variables  are  processed  with 
a  function  called  validate-exp.  The  validate-exp  function  operates  recursively 
to  “validate”  syms.  If  its  argument  is  a  sym  or  indexed-sym,  it  increments  the  count 
in  the  hide  slot,  and  then  applies  itself  recursively  to  the  expression  on  the  right-hand  side 


1  Lisp  uses  pointers  to  reference  such  objects  when  they  are  “contained”  in  other  oojects.  Thus,  when 
an  elementary  object  is  changed,  all  expressions  “containing”  tiidt  element  are  updated  since  their  pointers 
continue  to  point  at  the  changed  object 
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of  the  equation  (available  from  the  exp  slot).  If  the  argument  is  a  compound  expression, 
validate-exp  applies  itself  to  all  of  the  parts  of  the  expression  (arguments  in  a 
f unc,  factors  in  a  prod,  etc.) 

After  the  hide  values  have  been  established  for  all  indexed- syms  that  appear  on  the 
left-hand  side  of  an  equation,  a  second  pass  is  made  in  which  all  intermediate  variables 
that  are  used  only  once  {hide  -  l)  are  expanded  back  into  the  original  expressions. 

Example  Vehicle  System 

A  three  dimensional  vehicle  example  will  now  be  used  to  illustrate  some  of  the  above 
methods.  The  vehicle  is  shown  conceptually  as  a  multibody  system  in  Figure  4. 
Although  the  model  is  relatively  simple,  it  has  been  shown  to  predict  steering  responses 
that  ciosely  match  measurements  from  the  test  track  [16]. 


The  coordinate  system  of  the  inertial  reference  has  its  origin  in  the  plane  of  the  road, 
with  axes  along  directions  [Nl],  [N2],  and  [N3],  where  the  unit-vector  [N3]  points 
down1. 

The  vehicle  is  modelled  as  two  rigid  bodies.  One  is  body  NRB  that  is  free  to  translate 
and  yaw  in  the  plane  of  the  road.  The  second  is  body  RB,  which  rolls  relative  to  NRB 
about  a  roll  axis  tilted  as  shown.  This  model  nominally  has  four  degrees  of  freedom. 
However,  the  forward  speed  is  set  constant,  limiting  the  dynamical  degrees  of  freedom  to 
three.  The  vehicle  responds  to  two  applied  side  forces  from  the  tires  (Fyi  and  Fy2),  two 
aligning  moments  (Mzi  and  gravity,  and  a  roll  moment  generated  by  the  suspension 
springs  and  dampers. 


1  In  AUTOSIM  unit-vectors  are  written  enclosed  with  square  brackets. 
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All  of  the  ingredients  of  the  model  can  be  represented  using  the  computer  data  objects 
presented  earlier.  To  generate  the  objects  automatically,  some  Lisp  macros  and  functions 
are  included  in  AUTOSIM  to  build  the  description  of  the  multibody  system  on  the 
computer.  These  are  summarized  in  Table  9. 


Table  9.  Summary  of  AUTOSIM  commands  for  building  a  multibody  system 
_ description. _ 


Purpose 

add-body 

describe  one  body  completely,  including  its  position  in  the 
system  topology,  the  kinematics  of  its  joint,  and  the  mass  and 
inertial  properties  of  its  rigid  body. 

apply  gravitational  force  to  each  body  with  mass 

add- line- force 

describe  force-producing  component  (direction  of  force  is 
known) 

add-moment 

add-point 

identify  point  of  interest  on  a  body 

add- st rut 

describe  force-producing  component  (end  points  are  known) 

reset 

initialize  AUTOSIM  and  clear  previous  results 

set-speed- 

constant 

specify  that  a  generalized  speed  is  a  constant  (and  thus  remove  a 
dynamical  degree  of  freedom) 

small 

declare  syms  to  have  a  small-order  of  1 

The  example  system  is  described  using  these  macros  in  the  listing  shown  in  Figure  5. 
Note  that  most  of  the  information  provided  in  this  paper  is  not  needed  to  prepare  the 
inputs  shown  in  the  listing.  The  inputs  needed  to  model  the  example  system  involve  just 
eight  different  macros  with  a  fairly  simple  syntax.  The  entries  are  Lisp  forms,  as 
described  in  all  Lisp  textbooks.  The  types  of  Lisp  data  used  in  the  macros  are  the 
symbol,  string,  array,  number,  and  the  F-string  — a  Fortran-style  expression  entered  as 
string  preceded  by  an  exclamation  mark,  e.g.,  !  ”-kroll*q  (4)  -  croll*u(3)". 

Advanced  users  can  use  the  programming  power  of  Lisp  to  define  additional  variables, 
use  IX)  loops,  etc.  However,  a  knowledge  of  Lisp  is  not  required  to  use  AUTOSIM. 

The  specific  lines  of  input  shown  in  the  listing  of  Figure  5  are  now  described  briefly. 
Every  multibody  system  begins  with  the  inertial  reference  (N),  which  in  mm  contains  one 
point,  0,  the  origin.  These  objects  are  established  with  the  form  ( reset ) ,  which  also 
initializes  many  of  the  global  objects  used  in  AUTOISIM. 

The  first  add- body  macro  in  Figure  5  tells  several  things  about  the  new  body  to 
AUTOSIM:  (1)  the  new  body  has  the  inertial  reference  N  as  its  “parent,”  (2)  the  symbolic 
name  for  the  new  body  is  NRB,  (3)  a  more  descriptive  name  to  use  in  documentation  is 
“non-rolling  body,”  (4)  NRB  has  two  translational  degrees  of  freedom  relative  to  the 
inertial  reference,  in  the  directions  of  axes  1  and  2  ([N 1]  and  [N2]),  (5)  the  center  of  mass 
of  NRB  is  a  distance  HRA  above  the  ground,  and  (6)  NRB  has  a  single  rotational  degree 
of  freedom  about  axis  3  ([N3]). 
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(reset) 

(small  thetar  ixz) 

(add-body  n  nrb 

:nane  "non-rolling  body" 

: translate  (1  2) 

: cm-coords  #(0  0  !"-hra") 

: parent-rot-axis  3) 

(add-body  nrb  rb 

:name  "rolling  body" 

: body- rot-axes  (1) 

: parent-rot-axis 
*(! "cos (thetar) "  0 
!  "sin (thetar) ") 

: small-angles  ( t ) 

: joint-coords  #(00  !"-hra") 
:cm-coords  #(ce  0  !"-h") 

: inertia-matrix 
#2a((Ixx  0  Ixz) 

(0  Iyy  0) 

(Ixz  0  Izzr) ) ) 

(add-point  front 
"the  front  axle  point"  nrb 
# (L  0  0)) 

(small  (dot  (rot  nrb)  ' [n3]) 

(dot  (vel  nrbO)  '[nrb2])) 

(set-speed-constant 
(dot  (vel  nrbO)  '[nrbl])  speed) 


{add-line-force  fyl 

"Side  force,  front  axle" 

! "cal* (angle ( [nrbl] , 

vel (front),  [nrb3]) 
-  steer) 

-  cgl*ccoef l*q(4) " 
front  [nrb2]  o  :no-forcem  t) 

(add-line-force  fy2 

"Side  force,  rear  axle" 

! "ca2* (angle ( [nrbl] ,  vel (nrbO), 
[nrb3] )  -  krs2*q(4)" 
nrbO  [nrb2]  o  :no-forcem  t) 

(add-moment  mzl 

"Aligning  moment,  front  axle" 
[n3] 

!  "caml*  (angle  ( [nrbl] , 

vel (front),  [nrb3]) 

-  steer) " 
nrb  n  :no-forcem  t) 

(add-moment  mz2 

"Aligning  moment,  rear  axle" 
[n3] 

! "cam2* (angle ( [nrbl] , vel (nrbO) , 
[nrb3])  -  krs2*q(4))' 
nrb  n  :no-forcem  t) 

(add-moment  rollm 

"roll  moment  from  suspension" 
[rbl] 

!"-Kroll*q<4)  -  Croll*u(3)" 
rb  nrb  :no-forcem  t) 


(add-gravity)  I 

Figure  5.  Description  of  car  model  in  AUTOSIM. 


The  second  add-body  macro  designates  NRB  as  the  parent  body  and  names  the  new 
body  RB.  Further,  it  indicates  that  (1)  the  descriptive  name  of  RB  is  “rolling  body,”  (2) 
there  is  a  single  rotational  degree  of  freedom,  aligned  with  axis  1  of  the  coordinate  system 
of  RB,  [RBI],  (3)  the  rotation  axis  is  oriented  in  the  direction  whose  coordinates  (in  the 
frame  of  the  parent  NRB)  are  (COS(THETAR),  0,  SIN (THETAR))  (that  is,  the  axis  is 
inclined  down  from  axis  1  by  an  angle  THETAR  towards  axis  3),  (4)  the  rotation 
involves  a  small  angle,  (5)  the  origin  of  the  coordinate  system  of  RB  is  located  at 
coordinates  (0,  0,  -HRA)  in  the  coordinate  system  of  NRB,  (6)  the  center  of  mass  is 
located  at  coordinates  (CE,  0,  -H)  in  the  coordinate  system  of  RB,  and  (7)  the  inertia 
"OCX  0  IXZ " 


matrix  for  RB  is 


0 

IXZ 


IYY  0 
0  IZZRJ 


The  AUTOSIM  design  permits  both  variables  and  parameters  to  be  “small.”  In  this 
example,  the  parameters  THETAR  and  IXZ  were  declared  to  be  “small”  before  the  add- 
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body  macros  are  applied.1  Thus,  when  IXZ  is  multiplied  with  a  “small”  variable  and  the 
product  is  added  to  another  parameter,  the  product  is  recognized  as  being  numerically 
negligible  and  is  dropped  from  the  sum. 

The  body  objects  created  to  represent  these  two  bodies  are  printed  in  Figure  6  to 
show  the  values  associated  with  some  of  the  slots. 


Summary  of  body:  NRB 

Summary  of  body:  RB 

parent:  N 

parent:  NRB 

level:  1 

level:  2 

children:  (RB) 

children:  NIL 

Name:  Non-Rolling  Body 

Name:  Rolling  Body 

mass:  NRBM 

mass:  RBM 

inertia:  IZZNR*([N3].[N3]) 

inertia:  (IXX*([RB1].[RB1]) 

unit-vectors:  #([NRB1]  [NRB 2]  [N3]) 

+  IXZ*([RB3].[RB1]) 

new-trans-vars:  (Q(l)  Q(2)) 

+  IXZ*([RB1].[RB3]) 

new-trans-speeds:  (U(l)  U(2)) 

+  IYY*([RB2].[RB2]) 

new-rot-vars:  (Q(3)) 

+  IZZR*([RB3].[RB3])) 

new- rot-speeds:  (U(3)) 

unit-vectors:  #([RB1]  [RB2]  [RB3]) 

rot-dir-list:  ([N3]) 

new-rot-vars:  (Q(4)) 

trans-dir-list:  ([Nl]  [N2]) 

new-rot-speeds:  (U(4)) 

joint- pos:  (Point  NRBJ:  Body  N: 

rot-dir-list:  ([RBI]) 

#(0  0  0):  attachment  point  for  the 

joint-pos:  (Point  RBJ:  Body  NRB: 

non-rolling  body) 

#(0  0  -HRA):  attachment  point  for 

cm-pos:  (Point  NRB  CM:  Body  NRB: 

the  rolling  body) 

#(0  0  -HRA):  center  of  mass  of  the 

cm-pos:  (Point  RBCM:  Body  RB: 

non-rolling  body) 

#(CE  0  -H):  center  of  mass  of  the 

abs-w:  U(3)*[N3] 

rolling  body) 

abs-vj:  (U(1)*[NRB1]  +  U(2)*[NRB2]) 

abs-w:  (U(3)*[N3]  +  U(4)*[RB1]) 

cos  matrix:  #(COS(Q(3))  SIN(Q(3))  0) 

abs-vj:  (U(1)*[NRB1]  +  U(2)*[NRB2]) 

#(-SIN(Q(3))  COS(Q(3))  0) 

cos  matrix:  #(1.0  0  THETAR) 

#(001.0) 

#(-THETAR*Q(4)  1.0  Q(4)) 
#(-THETAR  -0(4)  1.0) 

Figure  6.  Description  of  body  structures  for  example  vehicle. 


The  macros  introduced  point  objects,  generalized  coordinates,  generalized  speeds, 
and  a  direction  cosine  matrix  based  on  the  degrees  of  freedom.  Because  the  parameter 
THETAR  and  the  roll  rotation  angle  are  both  “small”  angles,  the  direction  cosine  matrix 
of  RB  does  not  include  any  trigonometric  functions. 

Note  that  the  matrix  includes  the  product  -THETAR*Q(4),  which  is  of  order  2.  The 
reason  this  is  included  (rather  than  the  number  0)  is  that  a  small  expression  is  not  set  to 


1  The  macro  operates  by  finding  the  ayrcs  with  the  names  THETAR  and  IXZ  (or  creating  them  if  they 
don’t  already  exist),  and  then  setting  the  small-order  slot  of  each  to  a  value  of  1. 
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zero  unless  it  added  to  another  expression  of  an  order  that  is  lower  by  two  or  more.  For 
example,  if  THETAR*Q(4)  (order=2)  is  added  to  the  number  1  (order=0),  the  result  is  1. 

The  definitions  of  the  state  variables  can  also  be  printed  (the  add-body  macro 
creates  a  name  for  every  symbol  it  introduces,  and  also  sets  the  units).  The  summaries 
printed  by  AUTOSIM  are  shown  in  Figure  7. 


Generalized  Coordinates 

Q(l):  Translation  of  NRB  relative  to  the  attachment  point  for  the  non-rolling  body  along 
[nl].  (in) 

Q(2):  Translation  of  NRB  relative  to  the  attachment  point  for  the  non-rolling  body  along 
[n2].  (in) 

Q(3):  Rotation  of  the  non-rolling  body  relative  to  the  inertial  reference  about  axis  #3. 
(deg) 

Q(4):  Rotation  of  the  rolling  body  relative  to  the  non-rolling  body  about  axis  #1.  (deg) 

Generalized  Speeds  (before  set-speed-constant  macro  is  used) 

U(l):  Abs.  trans.  speed  of  NRB  along  axis  1.  (in/s) 

U(2):  Abs.  trans.  speed  of  NRB  along  axis  2.  (in/s) 

U(3):  Abs.  rotation  of  NRB,  axis  3.  (deg/s) 

U(4):  Rotation  of  RB  relative  to  NRB,  axis  1.  (deg/s) 

Figure  7.  Printed  summary  of  state  variables. 

Note  that  generalized  speeds  for  translational  velocity  are  defined  that  are  not 
derivatives  of  the  generalized  coordinates. 

The  macro  set-speed-constant  removes  a  dynamical  degree  of  freedom  by 
changing  slot  values  in  the  indexed-sym  object  that  represents  a  generalized  speed, 
and  then  renumbering  the  remaining  speeds.  The  renumbering  is  performed  by  -nan gin g 
the  i  slot  in  all  indexed-sym  objects  that  represent  generalized  speeds.  In  th*  example, 
the  forward  vehicle  speed,  initially  printed  as  “U(l)”  is  declared  to  be  a  constant  called 
SPEED.  The  macro  changes  the  const-or-var  slot  to  const,  the  dxdt  slot  to  0,  the  exp 
slot  to  SPEED,  and  the  i  slot  to  0. 

Printing  of  expressions  is  performed  recursively,  with  every  type  of  object  having  an 
associated  print  function.  If  an  object  is  changed  such  that  it  prints  differently,  all 
expressions  containing  that  object  will  also  print  with  the  “updated”  form.  The  print- 
function  associated  with  indexed-sym  objects  prints  the  expression  in  the  exp  slot  if 
the  i  value  is  0.  Thus,  all  expressions  that  contain  the  generalized  speed  originally  named 
U(l)  will  now  print  that  object  as  “SPEED.”  The  generalized  speeds  have  been 
renumbered  and  appear  as  shown  in  Figure  8. 

Because  AUTOSIM  will  freely  rename  objects,  the  analyst  must  be  careful  when 
referring  to  state  variables  by  name.  For  example,  the  speed  that  was  originally  called 
U(2)  is  now  U(l).  The  possibility  of  erroneously  naming  the  wrong  variable  can  be 
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[Generalized  Speeds  (After  set-speed-constant  macro  is  used) 

SPEED:  Abs.  trans.  speed  of  NRB  along  axis  1.  (in/s) 

U(l):  Abs.  trans.  speed  of  NRB  along  axis  2.  (in/s) 

U(2):  Abs.  rotation  of  NRB,  axis  3.  (deg/s) 

U(3):  Rotation  of  RB  relative  to  NRB,  axis  1.  (deg/s) _ 

Figure  8.  Printed  summary  of  generalized  speeds  after  speed  is  set  constant 

eliminated  by  referring  to  speeds  generically  as  scalar  projections  (dot  products)  of 
velocities.  The  small  and  set-speed-constant  macros  near  the  bottom  of  the 
left  column  in  Figure  5  illustrate  this.  The  small  macro  is  applied  to  two  expressions: 
the  first  is  the  rotational  velocity  of  NRB  dotted  with  the  unit-vector  [N3]  (i.e.,  the  yaw 
rate),  and  the  second  is  the  velocity  of  the  origin  of  NRB  dotted  with  the  unit- vector 
[NRB2]  (i.e.,  the  lateral  component  of  the  velocity).  The  set-speed-constant 
macro  is  applied  to  the  speed  defined  as  the  velocity  of  the  origin  of  NRB  dotted  with  the 
unit- vector  [NRB1]  (i.e.,  the  the  forward  component  of  the  velocity).  These  expressions 
are  always  valid  and  do  not  require  knowledge  of  how  the  speeds  are  currently  named. 
For  example,  if  we  were  to  add  more  degrees  of  freedom  to  the  model,  such  that  the 
generalized  speeds  would  be  numbered  differently,  the  generic  descriptions  in  the  listing 
of  Figure  5  would  still  be  correct 

The  macro  add-point  is  used  to  define  a  point  called  front  at  which  the  front  tire 
force  is  applied.  The  macros  add-line-force  and  add-moment  are  used  to  define 
tire  forces  and  moments. 

The  arguments  to  the  add-line-force  macro  are:  (1)  a  symbol  for  the  force 
object  (to  go  in  the  symbol  slot),  (2)  a  name  for  the  force  (to  go  in  the  name  slot),  (3)  an 
expression  for  the  magnitude  of  the  force  (to  go  in  the  exp  slot),  (4)  a  point  upon  which 
the  force  acts  (to  go  in  the  point 1  slot),  (5)  a  direction  associated  with  the  force  (to  go  in 
the  dir  slot),  and  (5)  a  point  associated  with  a  body  from  which  the  force  is  reacted  (the 
point  itself  is  not  saved,  however  the  body  associated  with  the  point  goes  into  the 
body2  slot).  The  bodyl  slot  is  assigned  the  body  associated  with  the  point  in  pointl,  and 
the  point2  slot  is  NIL. 

The  most  complicated  of  the  above  arguments  is  the  one  for  the  force  magnitude. 
The  expression  involves  the  slip  angle  for  a  point,  defined  as  the  angle  between  the 
forward  direction  of  the  point,  and  the  velocity  vector  of  the  point.  The  slip  angle  angle 
is  defined  for  the  front  tires  with  the  F-string 

! "angle ( [nrbl ] ,  vel (front),  [nrb3])  -  steer" 

The  F-string  is  parsed  (interpreted)  by  AUTOSIM  as:  derive  an  expression  for  the  angle 
between  the  forward  direction  [nrbl  ]  and  the  velocity  of  the  point  named  front, 
vel  ( front) ,  with  a  sign  defined  by  a  positive  angle  corresponding  to  a  clockwise 
sweep  when  viewed  from  an  observer  looking  in  the  direction  [  nrb3  ] ,  and  then  subtract 
st^er  from  that  angle.  The  expression  obtained  by  AUTOSIM  is 

(STEER  ~(U(1)  +•  L*U  (2)  ) /SPEED) 
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Once  the  system  is  described  to  AUTOSIM,  the  equations  of  motion  are  derived  by  a 
function  named  dynamics,  and  the  simulation  code  is  generated  with  a  function 
write-sim.  Sections  of  Fortran  code  generated  by  AUTOSIM  are  shown  in  some 
following  figures.  Before  the  code  removal,  the  equations  of  motion  include  83 
intermediate  variables.  A  portion  of  the  code  is  shown  in  Figure  9  as  it  appears  before 
unused  intermediate  variables  are  removed. 


C  Equations  of  Motion 

C 

Ur  uynduiiiCdx  equaexons 

C 

c 

C (3)  -  COS (Q (3) ) 

z (5)  -  Q (4) *U(2) 

S (3)  -  S IN ( Q ( 3 ) ) 

Z ( 6)  -  THETAR*U (2) 

C 

Z  (7)  -  (U  (3)  +  Z  ( 6)  ) 

C  Kinematical  equations 

Z  (8)  -  U  (3)  *Z  (5) 

C 

Z ( 9)  -  U (3) *U(2) 

Z(l)  -  SPEED *C (3) 

ZU0)  -  CE*Q  (4) 

Z<2)  -  U(l) *S (3) 

•  •  • 

QP(l)  -  (Z (1)  -Z (2) ) 

Z (80)  -  PC (26) *Z  < 7 6 ) 

Z  (3)  -  U  (1)  *C  (3) 

Z (81)  -  PC (39) *Z (79) 

Z  ( 4)  -  SPEED*S (3) 

Z  (82)  -  (-Z  (70)  +  Z (80)  +  Z (81) ) 

QP(2)  -  (Z (3)  +  Z (4) ) 

Z (83)  -  PC (56) *Z (82) 

QP (3)  -  U(2) 

UP(1)  -  -Z  (76) 

QP(4)  -  U (3) 

UP (2)  -  -Z (79) 

UP  (3)  -  Z  ( 83) 

Figure  9.  Portions  of  simulation  code  before  code  removal. 

After  unused  code  is  removed  and  intermediate  variables  that  appear  but  once  are 
eliminated,  only  five  intermediate  variables  are  retained.  The  listing  of  the  Fortran 
assignment  statements  actually  written  into  the  simulation  code  are  shown  in  Figure  10. 
The  indexed- sym  objects  that  are  printed  as  the  Fortran  arrays  PC  and  Z  are 
renumbered,  so  the  specific  array  elements  in  the  listing  of  Figure  10  are  not  the  same  as 
those  in  Figure  9. 

The  listing  in  Figure  10  requires  24  multiply/divides,  18  add/subtracts,  and  2  function 
evaluations  to  compute  the  derivatives.  37  constant  expressions  were  identified  and  are 
precomputed.  The  code  to  compute  them  is  listed  in  Figure  11. 

Details  of  this  analysis  and  the  computational  efficiency  of  AUTOSIM  have  been 
presented  elsewhere,  and  the  simplification  techniques  were  shown  to  influence  the 
computational  efficiency  by  almost  a  factor  of  50:  Using  only  the  simplification 
techniques  numbered  1,2,3,  and  10,  a  total  of  878  multiplies,  divides,  and  function 
evaluations  are  needed  to  compute  the  derivatives  at  each  time  step.  The  best  efficiency 
occurred  using  all  techniques  except  no.  7,  in  which  case  the  number  of  multiplies, 
divides,  and  function  evaluations  was  reduced  to  19  [17]. 

Summary 

Automated  modeling  of  multibody  systems  has  usually  offered  convenience  over  the 
alternative  of  formulating  equations  of  motion  by  hand  and  writing  a  specialized 
simulation  code  to  solve  them.  However,  there  have  been  trade-offs.  Numerical 


non  noo  ooo 


Equations  of  Motion 


C (3)  -  COS (Q  (3) ) 

S (3)  -  SIN (Q (3) ) 

Kinematical  equations 

QP(1)  -  ( SPEED *C (3)  -U(l) *S (3) ) 

QP (2)  -  (U{1) *C (3)  +  SPEED*S (3) ) 

QP  (3)  -  U (2) 

QP  (4)  -  U (3) 

Dynamical  equations 

Z(l)  -  PC (34) *U (2) 

Z(2)  -  (PC (16) *Q  (4)  +  CROLL*U (3)  +  H*Z(1)) 

Z  (3)  -  (PC  (14)  +  PC(38)*Q(4)  -PC(13)*U(1)  -PC(12)*U(2)  + 

&  PC(1)*Z(1)  -PC  (19)  *Z  (2)  ) 

Z  (4)  -  PC<31)  *(PC<2)  -PC  (8)  *Q  (4)  -PC(9)*U(1)  +PC(35)*U(2)  +Z(1) 

&  -PC (20) *Z (2)  -PC (25) *Z  (3) ) 

Z  (5)  -  PC  (32)  *  (Z  (3)  -PC  (27)  *Z  (4) ) 

UP  (1)  -  -Z  (4) 

UP  (2)  -  -Z  (5) 

UP (3)  -  (-PC (33) *Z (2)  +  PC(39) *Z(4)  +  PC (40) *Z (5)  ) _ 

Figure  10.  Listing  of  code  generated  to  compute  derivatives  of  state  variables  for 

example  vehicle  model. 


PC(1)  -  (CE  +  H*THETAR) 

PC (19) 

a 

PC (17) /PC (18) PC (20)  - 

PC (2)  -  CA1* STEER 

PC  (7)  /PC  ( 18 ) 

PC (3)  -  L*CA1/ SPEED 

PC (21) 

- 

PC (17) *PC (19) 

PC (4)  -  RBM*GEES 

PC (22) 

■ 

(PC (15)  -PC (21) ) 

PC (5)  -  NRBM*SPEED 

PC (23) 

- 

PC  (17)  *PC  (20 ) 

PC  (6)  -  RBM*(CE  +  H*THETAR) 

PC  (24) 

- 

(PC (6)  -PC  (23) ) 

PC (7)  -  H*RBM 

PC (25) 

SB 

PC (24) /PC (22 ) 

PC (8)  -  (-CA2*KRS2  +  CG1*CC0EF1) 

PC (26) 

- 

PC (7) *PC (19) 

PC ( 9)  -  (CA1  +  CA2) /SPEED 

PC (27) 

m 

(PC (6)  -PC  (26) ) 

PC (10)  -  (RBM  +  NRBM) 

PC (28) 

- 

PC (7) *PC (20) 

PC (11)  -  (CAM2*KRS2  - 

PC  (2  9) 

- 

PC (25) *PC (27 ) 

L*CG1*CC0EF1) 

PC (30) 

- 

(PC(10)  -PC (28)  -PC (29) ) 

PC  (12)  -  LMCAM1  +  L*CA1) /SPEED 

PC (31) 

- 

1 . 0/PC (30) 

PC  (13)  -  (CAM1  +  CAM2  + 

PC  (32 ) 

- 

1 . 0/PC (22) 

L*CA1) /SPEED 

PC (33) 

- 

1 . 0/PC (18) 

PC (14)  -  STEER* (CAM1  +  L*CA1) 

PC (34) 

- 

RBM*SPEED 

PC (15)  -  (IZZR  +  IZZNR  +  RBM* (CE 

PC (35) 

m 

(NRBM*SPEED  -L*CA1/SPEED) 

+  H*THETAR) **2) 

PC  (36) 

- 

H*RBM/PC ( 18 ) 

PC (16)  -  (KROLL  -H*RBM*GEES) 

PC  (37) 

- 

(IXZ  +  IXX*THETAR  + 

PC (17)  -  (IXZ  +  IXX*THETAR  + 

H*RBM*  (CE  + 

H*RBM* (CE  +  H*THETAR) ) 

PC (18)  -  (IXX  +  RBM*H**2) 

H*THETAR) ) /PC (18) 

Figure  11.  Listing  of  code  generated  to  precompute  constants  for  example 


vehicle  model. 
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generalized  simulation  codes  have  been  much  less  efficient,  because  many  of  the 
simplification  methods  routinely  used  by  a  human  analyst  are  so  specific  to  the  system 
being  analyzed  that  they  cannot  be  incorporated  into  a  generalized  formulation.  Also, 
some  types  of  subcomponent  models  are  difficult  or  impossible  to  include  in  the  system 
description.  Symbolic  multibody  programs  have  offered  better  efficiency  then  the 
generalized  numerical  codes,  but  they  have  not  been  capable  of  providing  “complete” 
solutions.  The  user  must  still  develop  expressions  for  many  active  forces  and  moments 
by  hand,  and  incorporate  them  correctly  into  the  portion  of  the  code  generated 
automatically. 

Methods  have  been  presented  for  representing  all  of  the  components  of  a  simulated 
multibody  system  in  symbolic  form  on  the  computer.  Data  objects  are  defined  for 
representing  (1)  symbolic  algebraic  expressions  for  vector/dyadic  analyses,  (2)  physical 
components  in  a  multibody  system,  and  (3)  program  structures  needed  in  a  simulation 
code.  A  language  called  AUTOSIM  has  been  written  in  Lisp  to  implement  these 
methods.  When  all  of  these  objects  are  available  for  computer  manipulation,  the  same 
modeling  and  programming  strategies  employed  by  humans  can  be  mimicked  in 
computer  software.  An  example  vehicle  handling  model  is  used  to  illustrate  how  forces 
and  moments  basic  to  vehicle  simulations  are  described  in  this  language,  and  how  the 
symbolic  computation  is  combined  with  Kane's  dynamics  analysis  formalism  to  generate 
a  working  simulation  code  for  that  model. 

With  these  methods,  an  automated  modeling  capability  now  exists  that  combines  the 
convenience  of  a  “complete”  solution  (associated  with  a  generalized  simulation  code) 
with  the  efficiency  and  specialization  possible  when  simulation  codes  are  developed  by 
hand. 
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ABSTRACT.  Shear  instabilities  in  the  form  of  shear  bands  are  often  observed  when 
metals  are  deformed  at  high  strain  rates.  According  to  one  theory,  shear  band  formation  is 
attributed  to  a  destabilizing  feedback  mechanism  induced  by  thermal  softening  properties 
of  materials.  In  this  note  we  review  some  mathematical  results  on  simple  model  problems, 
with  the  goal  of  assessing  the  effect  of  various  contributing  factors  (like  thermal  softening, 
strain  hardening  and  strain-rate  sensitivity)  on  the  response  of  shear  motions. 

1.  INTRODUCTION. 

One  of  the  most  striking  manifestations  of  instability  in  solid  mechanics  is  the  localiza¬ 
tion  of  plastic  deformation  and  the  consequent  formation  of  shear  bands,  observed  during 
torsional  tests,  at  high  strain  rates,  on  steels  (e.g.  [1],  [6],  [9],  [10],  [17]).  According  to 
one  popular  theory  formation  of  shear  bands  is  attributed  to  a  thermoplastic  instability 
mechanism  that  is  induced  by  thermal  softening  properties  of  materials  (i.e.,  the  property 
that  plastic  flow  is  enhanced  with  temperature  increase).  The  argument  goes  as  follows: 
Non-uniform  straining  induces  non-uniform  heating,  which,  in  turn,  enhances  the  plastic 
flow  at  hotter  regions  and  reduces  it  a  colder  regions.  This  creates  a  destabilizing  feedback 
mechanism  which  tends  to  create  localization  of  plastic  deformation  and  formation  of  shear 
bands.  To  the  above  destabilizing  mechanism  there  is  opposition  from  internal  dissipation 
and,  possibly,  from  strain  hardening  properties  of  the  material.  Whether  localization  will 
occur  depends  on  the  relative  weight  of  thermal  softening,  strain  hardening  and  strain- 
rate  sensitivity.  The  intent  of  this  work  is  to  elucidate  the  interplay  of  thermal  softening 
and  strain  hardening  in  shearing  deformations  of  strain-rate  dependent  materials  and  to 
provide  quantitative  criteria  for  stability  as  well  as  instability. 

As  a  test  problem  we  consider  the  adiabatic,  plastic  shearing  of  an  infinite  plate  of  unit 
thickness.  The  plate  is  subjected  to  either  steady  shearing  or  prescribed  tractions  at  the 
boundaries.  In  a  Cartesian  coordinate  system  the  infinite  plate  occupies  the  region  between 
the  planes  x  =  0  and  x  =s  1.  The  thermomechanical  process  is  described  by  the  velocity 
field  in  the  shearing  direction  u(x,t),  the  shear  strain  y(x,  t),  the  shear  stress  <r(x,t),  the 
temperature  9(x,t)  and  the  heat  flux  q(x,t).  We  assume  that  the  referential  density  and 
the  specific  heat  axe  constants,  taken  equal  to  one  and  that  the  elastic  effects  are  negligible. 
Then  the  balance  laws  of  momentum,  energy  and  the  kinematic  compatibility  relation  read 


7 1  =  vx 


(1.3) 


The  above  equations  are  supplemented  with  constitutive  laws  for  the  heat  flux, 

9  =  0,  (1.4) 

corresponding  to  the  assumptiton  that  the  process  is  adiabatic,  and  for  the  stress, 


a  =  f  (0,7, It). 

(1.5) 

The  constitutive  assumption  (1.5)  is  appropriate  for  a  solid  in  the  plastic  region  exhibiting 
thermal  softening  (/a(0,7,7t)  <  0),  strain  hardening  (/T(0, 7,7t)  >  0)  and  strain  rate 
sensitivity  (fit(0,7,7t)  >  0).  At  the  present  time  there  is  not  sufficient  theoretical  or 
experimental  evidence  to  indicate  precisely  the  form  of  the  function  f(9, 7,7t).  Several 
choices  have  been  used  in  the  literature.  An  example  is  the  empirical  power  law  (e.g.  [6]) 

ar  =  ,  i/<0,  m>0,  n>0. 

(1.6) 

Several  studies  of  (1.1)  -  (1.5)  under  various  types  of  loading  have  appeared  recently 
in  the  mathematical  literature  [2-4],  [8],  [13-15].  They  mainly  dead  with  special  instances 
of  the  constitutive  law  (1.6).  In  this  note  we  present  a  survey  of  these  results  with  two 
objectives: 

(a)  To  lay  out  the  stabilizing  or  destabilizing  influence  of  the  various  factors  associated 
with  the  problem  (like  thermal  softening,  strain  hardening  and  strain-rate  sensitivity) 
by  studying  completely  the  special  case  of  the  power  law.  This  is  done  in  Section  2. 

(b)  To  give  some  preliminary  answers  to  the  question:  “How  to  define  mathematically  a 
shear  band?”  This  question  is  undertaken  in  Section  3. 

2.  THE  POWER  LAW 

We  consider  the  system  of  partial  differential  equations 

Ut  =  (0''7mMn-1u*), 

(2.1) 

9t  = 

(2.2) 

7 1  =  Vx 

(2.3) 

a  =  ^*,7rT*l7t|n~17t>  v  <  0,  n  >  0,  melR, 

(2.4) 

where  0  <  x  <  1,  t  >  0,  together  with  boundary  conditions 

u(0,<)  =  0,  v(y  O  =  1 

( BCV ) 

or 

<7(0,f)=  0,  <7(1,0  =  1 

and  initial  conditions 

( BCS ) 

v(x.  0)  =  y0(r),  9(x,0}  =  90(x),  7(x,0)  =  70(x),  0  <  j 

;<1.  (2.5) 
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The  system  (2.1)  -  (2.3)  consists  of  the  degenerate  parabolic  equation  (2.1)  in  v,  cou¬ 
pled  through  the  diffusion  coefficient  9V ~fm  with  the  hyperbolic  equations  (2.2)  and  (2.3). 
Equations  (2.1)  -  (2.3)  are  based  on  the  power  law  (2.4),  which  is  appropriate  for  a  material 
exhibiting  thermal  softening  ( v  <  0),  strain  hardening  (m  >  0)  or  strain  softening  (m  <  0), 
and  strain  rate  sensitivity.  The  shearing  is  caused  by  steady  shearing  at  the  boundary  or 
by  prescribed  tractions  at  the  boundaries  in  case  (BCV)  or  (BCS)  hold,  respectively. 

The  relevant  question  is  to  study  the  behavior  of  solutions  of  the  equations  (2.1)  - 
(2.3)  together  with  (BCV)  (or  (BCS))  and  (2.5)  for  different  values  of  the  parameters 
£/,  m  and  n.  Prom  the  point  of  view  of  analysis,  the  key  question  is  whether  the  diffusion 
coefficient  in  equation  (2.1)  tends  to  zero  relatively  slowly  and  in  an  “orderly”  fashion,  or 
whether  nonuniformities  develop  and  the  material  exhibits  unstable  response.  A  goal  of 
the  analysis  is  to  provide  quantitative  criteria  for  stability  and  instability  in  terms  of  the 
parameters  i/,  m  and  n  and  in  the  case  of  instability  to  determine  whether  shear  bands 
form. 

The  system  (2.1)  -  (2.3)  is  invariant  under  a  group  of  scaling  transformations  (cf.  [12]). 
If  {u(x,t),  7 (x,f)}  is  a  solution  of  (2.1)  -  (2.3)  on  E  x  (0,  oo),  then  the  rescaled 

functions  {u(A)(r,t),  0(X)(x,  t),  7 (\)(x,t)}  defined  by 


U(>)(x,t)  =  A“u(Az, \~°t)  (2.6) 

9{x)(x,t)  =  \2i0(\x,\’±t)  (2.7) 

1{x)(x,t)  =  \S±?±~,(\x,\-±t)  (2.8) 

where  A  >  0  and  6,  a  are  any  constants  with 

2u6  +  m(a  +  6  +  1)  +  n(at  4-  <5)  +  a  —  5  +  1=  0  (2.9) 

is  again  a  solution.  It  is  shown  in  [12]  that  the  scaling  invariance  induces  self-similar 
solutions  which  blow  up  when  v  +  m  4-  n  <  0.  Although  these  self-similar  solutions  blow 
up  at  the  boundary,  they  indicate  the  existence  of  a  destabilizing  mechanism  induced  by 
the  variable  diffusion  coefficient  in  (2.1). 

In  what  follows  we  describe  some  recent  results  towards  studying  the  asymptotic 
behavior  of  (2.1)  -  (2.5). 

(a)  Velocity  Boundary  Conditions  (BCV) 

The  system  of  equations  (2.1)  -  (2.3)  together  with  boundary  conditions  (BCV)  admits 
the  class  of  solutions 

v(x,t)  =  x  >2.10) 

7(x,f)  =  *  +  r0  (2.11) 

0(x,t)  =  (0J-1'  +  i-^[(t  +  r0)m+I  -  I7+I]}^r  (2.12) 

m  +  1 

where  To  and  0o  are  positive  constants.  These  solutions  represent  uniform  shearing.  The 
relevant  question  is  whether  a is  t  increases  1 ;*(x,f),  9{x,t)  and  -y(x.*)  develop  substantial 
nonuniformities  or  else,  they  approach  the  uniform  shearing  solutions  as  t  -*  x. 
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This  question  remains  open  in  the  general  case.  However,  some  answers  have  been 
given  in  the  special  cases  m  =  0  and  v  =  0;  in  each  of  these,  one  of  the  equations  decouples 
from  the  rest.  Specifically,  it  is  known  that  if  (i)  v  <  0,i /  +  n  <  0 ,m  =  0,  or  if  (ii) 
m  <  0,  m  +  n  >  0,  v  =  0,  the  uniform  the  uniform  shearing  solutions  axe  asymptotically 
stable  ([8],  [13],  [14],  [16]). 

(b)  Stress  boundary  conditions  (BCS) 

The  main  difficulty  in  the  general  case  (i.e.  both  v  and  m  nonzero)  is  that  the  diffusion 
coefficient  8v~fm  in  the  parabolic  equation  (2.1),  governing  the  asymptotic  distribution  of 
vx,  does  not  have  an  a-priori  trend  of  growth  or  decay,  as  was  the  situation  in  the  special 
cases  m  =  0  or  u  =  0.  However,  this  difficulty  can  be  circumvented  in  the  case  of  stress 
boundary  conditions. 

For  technical  simplicity,  we  restrict  attention  to  the  case  n  =  1.  System  (2.1)  -  (2.3) 
is  equivalent  (cf.  [15])  to  the  system  of  reaction  diffusion  equations: 


**  =  ^"Vxr  +  ~^r(^y  +  m) 

9  =_^_ 

*  9vj m 

a 

—  9vym 

with  boundary  conditions  (BCS)  and  initial  conditions 


(2.13) 

(2.14) 

(2.15) 


ct(x,0)  sr  <7q(x)  >  0,  9(x,0)  =  90(x)  >  0,  7(x,0)  =  7o(x)  >  0  .  (2.16) 


For  these  initial  conditions,  it  turns  out  that  <r(x,t)  >  0,  9(x,  t)  >  9 o(x)  >  0,  7(1,  t)  > 
70(1)  >  0,  0  <  z  <  1,  t  >  0,  and  the  above  initial-boundary  value  problem  is  well  posed 
locally  in  time  (in  Schauder  spaces). 

Moreover,  viewing  (2.13)  as  a  parabolic  equation  in  a  with  coefficients  governed 
by  (2.14),  (2.15),  one  cam  use  comparison  principles  for  (2.13)  and  obtain  a-priori  bounds 
for  <r(x,  t)  provided  v+m  <  0.  These  estimates  open  the  way  to  showing  that  the  parameter 
space  can  be  separated  into  three  distinct  regions,  namely,  v  +  m  <  —1,  —1  <  v  +  m  <  — ^ 
and  — !■  <  v  +  m  <  0  across  which  the  behavior  of  solutions  changes  drastically  (cf.  [15]). 

1)  In-  the  region  v  +  m  <  —1  smooth  solutions  breads  down  in  finite  time  T",  for  any  initial 
data.  As  t  — ♦  T*,  sup0<r<1  7 (x.t),  sup0<r<1  vx(x,  t)  and  sup0<x<1  0(x,  t)  (the  latter  in 
case  m  >  —1)  tend  to  co,  in  such  a  way  thaf  cr  =  9v~fmvx  remains  bounded. 

2)  In  the  region  —l<t/-t-m<0  smooth  solutions  exist  globally  in  time  for  any  initial 
data. 

3)  If  in  addition  —  ^  +  m  <  0,  solutions  stabilize  as  t  — *  00  and  asymptotically  they 

behave  as  follows: 

a(x,t)  =  x  +  0{t  «’+'»+ 1  )  (2.171 

9(x,t)  =  x7(x,t)  +  0(r^feFT)  (2. 18) 

v(x,  t)  sb  t  +  0(t~ ) .  (2.19) 


584 


4)  In  the  special  case  m  =  0,  in  the  in-between  region  —  1  <  v  <  — (3.17)  -  (3.19) 
are  in  general  no  longer  satisfied.  More  precisely,  given  any  £  >  0,  there  are  initial  data 
<r0(x),  0q(x)  and  7o(x)  such  that  |<r0(a:)  —  x\  <  e,  0  <  x  <  1,  but  a(x,t)  does  not  converge 
to  x  as  t  — ►  oo. 

Finally,  for  m  =  0,  u  <  —  1  there  are  solutions  that  blow  up  only  at  the  boundary 
x  =  1  and  look  like  shear  bands.  Currently,  the  above  results  axe  being  extended  to  cover 
the  general  case  n  arbitrary  [16]. 

(c)  Other  boundary  conditions 

Charalambakis  [2],  [3]  and  Charalambaskis  and  Houstis  [4]  consider  (2.1)  -  (2.4),  in  the 
special  case  v  =  0  or  m  =  0  in  situations  when  the  shearing  is  caused  by  other  boundary 
conditions  or  by  body  forces;  they  establish  asymptotic  stability  results  in  these  cases, 
under  restrictions  on  the  range  where  the  parameters  vary. 

3.  SHEAR  BANDS 

In  this  Section  we  discuss  the  following  question:  “How  to  approach  shear  bands 
from  an  analysis  viewpoint?”  Traditionally,  the  formation  of  shear  bands  is  associated 
with  some  type  of  development  of  nonuniformities  in  the  field  variables  of  the  problem 
(e.g.  [6],  [9],  [17]).  It  is  however  debatable  whether  a  “slowly”  evolving  nonuniformity 
should  be  termed  as  a  shear  band.  Thus  we  think  that  the  aforementioned  question  has 
also  interesting  practical  implications. 

As  a  point  of  reference  for  this  discussion  we  will  use  the  simple  model 


Vt  =  (/ i(9)vx)x 

(3.1) 

9t  =  n(9)v2x 

(3-2) 

with  boundary  conditions  (BCV)  or  ( BCS )  and  initial  conditions 

0(x,O)  =  9q{x)  ,  u(x,0)  =  uo(z) ,  0  <  x  <  1 
with  #o(x)  >  0.  uo*(z)  >  0,  0  <  x  <  1.  Recall  that 

a  =  fi(0)vx  (3.4) 


(3.3) 


with  n(Q)  >  0,  n'{9)  <  0 . 

For  this  system  in  the  case  of  stress  boundary  conditions  (BCS),  it  turns  out  [15]  that 
there  is  a  unique  classical  solution  defined  on  a  maximal  interval  of  existence  [0, 1]  x  [0,  T*). 
Moreover,  if  T*  <  oo 

lim  sup  9(x,t)  =  cc  (3.5) 

‘TT*  o<x<l 


and 


lim  sup  sup  vt(x,  t)  =  oo  (3.6) 

(TT*  0<i<1 


In  case  n(£)d£  =  oo  then  T*  =  +co,  while,  in  case  n(£)d$  <  oc  then  T"  <  J-oc. 
In  the  special  case 


(3.7) 
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the  parameter  space  v  <  0  is  decomposed  into  three  regions:  (a)  v  <  —  1  a  region  of 
blowup,  (b)  -1  <  v  <  -|  a  region  of  global  existence  but  where  nonuniforxnities  of  the 
initial  data  persist  and  (c)  —  <  v  <  0  a  region  where  dissipation  is  predominant  and 

leads  to  stable  response  (cf.  Section  2). 

It  can  be  seen  by  the  relevant  analysis  [15]  that  the  region  v  <  —  1  is  clearly  associ¬ 
ated  with  development  of  shear  bands;  also  the  region  —  1  <  i/  <  —  ^  is  associated  with 
nonuniform  response. 

Further  than  the  above  type  of  response,  there  is  a  different  more  subtle  type  of 
response  which  may  be  associated  with  localization  and  formation  of  shear  bands.  This 
possibility  was  pointed  out  to  the  author  by  Dr.  T.  Wright.  Namely,  it  is  conceivable 
that  large  time  results  describe  the  predominance  of  dissipation  as  t  — *  oo;  but,  maybe  at 
intermediate  times  large  structures  develop  and  then  they  get  washed  out  because  of  the 
dissipation.  This  is  mother  possible  scenario  that  needs  to  be  investigated. 

To  pursue  the  subtleties  of  this  question  one  step  further,  consider  the  system  (3.1)  - 
(3.4)  with  velocity  boundary  conditions  ( BCV )  in  the  special  case  n (9)  =  9V,  v  <  0.  In 
this  case  the  uniform  shearing  solutions  (u(x),  0(f))  are 

v(i)  =  x  ,  0(f)  =  [0j~"  +  (1  -  u)t]^  , 

corresponding  to  initial  data  (vo(x),  9q(x))  =  (x,©o).  It  is  shown  in  [8]  that  if  [u(x,f), 
0(x,t)]  is  any  solution  of  (3.1)  -  (3.4)  and  (BCV)  then 

vr(r,t)  =  l  +  0(t'f^)  (3.8) 

-±-.e1-,,(x,t)  =  t  +  0(t1-&)  (3.9) 

as  t  — ►  oo.  Moreover,  if  (u0(r),  #o(x))  is  a  small  perturbation  of  (x,  0o),  for  some  ©o  >  0, 
one  obtains  in  [16]  that 

|0(x,  t)  -  ©(f)}  <5(1+ t)^  (3.10) 

where  6  is  of  the  order  of  magnitude  of  the  initial  perturbation.  In  other  words  the  distance 
of  the  solution  9(x,t)  from  the  uniform  shearing  solution  ©(f)  is  controlled  by  the  rate  of 
growth  of  0(f)  and  the  initial  perturbation.  Nevertheless,  it  is  still  possible  that  this 
difference  grows  but  at  a  slower  rate.  The  relevant  question  here  is  when  do  we  call  a 
time  dependent  solution  asymptotically  stable,  if  the  perturbation  grows  at  a  slower  rate 
than  the  basic  solution  or  if  the  perturbation  decays.  Both  answers  are  legitimate  as  far 
as  mathematical  definitions  are  concerned  but  they  have  different  implications  on  when 
to  call  a  process  stable  and,  for  this  particular  problem,  on  what  to  define  as  nonuniform 
response  and  shear  band.  Further  analysis,  as  well  as  numerical  experimentation  on  simple 
models,  are  needed  in  order  to  answer  this  question. 
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ABSTRACT .  This  paper  presents  an  elastic-plastic  analysis  of  a  thick- 
walled  composite  tube  subjected  to  internal  pressure.  The  composite  tube  is 
constructed  of  a  steel  liner  and  a  graphite-bismaleimide  outer  shell. 

Analytical  expressions  for  stresses,  strains,  and  displacements  are  derived  for 
all  cases  where  the  structure  is  subjected  to  internal  pressure.  The  loading 
ranges  include  elastic,  elastic-plastic,  and  fully-plastic  up  to  failure. 
Numerical  results  for  the  hoop  strains  in  several  composite  tubes  are  presented. 

INTRODUCTION.  Organic  composites  have  become  familiar  structural  com¬ 
ponents  in  many  applications  that  require  high  stiffness  and  low  weight.  A 
current  problem  in  Army  cannon  design  is  to  replace  a  portion  of  the  steel  wall 
thickness  with  an  organic  composite.  The  steel  liner  maintains  the  tube  projec¬ 
tile  interface  and  shields  the  composite  from  the  extremely  hot  gases.  The 
steel  also  has  elastic  properties  in  the  radial  direction  that  are  better  than 
the  composites  for  transferring  loads.  The  theoretical  and  experimental  results 
for  an  organic  composi te- jacketed  steel  tube  subjected  to  internal  pressure  in 
the  elastic  range  were  reported  in  a  recent  paper  [1],  This  paper  presents  an 
elastic-plastic  analysis  of  the  composite  tube  problem.  The  composite  tube  is 
constructed  of  a  steel  liner  and  a  graphite-bismaleimide  outer  shell. 

Analytical  expressions  for  stresses,  strains,  and  displacements  are  derived  for 
loading  within  and  beyond  the  elastic  range  up  to  failure. 

ELASTIC  RANGE.  Figure  1  shows  a  schematic  of  the  composite  tube  problem. 
The  composite  tube  consists  of  an  inner  steel  "liner"  and  an  outer  composite 
"jacket."  The  steel  liner  of  inside  radius  a  and  outer  radius  b  is  wrapped  in 
the  circumferential  direction  with  a  graphite-bismaleimide  organic  composite  of 
outside  radius  c.  The  elastic  material  constants  for  the  composite  and  the 
steel  are  given  in  Table  1. 

TABLE  1.  ELASTIC  CONSTANTS  OF  COMPOSITE  JACKET  AND  STEEL  LINER 


Elastic  Constants  for  IM6/Bismaleimide, 

553;  F.V.R. 

Er  =  1.126  Mpsi 

vr9  *  0.01524 

v9p  =  0.3155 

Ee  =  23.31  Mpsi 

2  0 . 3155 

vzQ  =  0.01524 

Ez  =  1.126  Mpsi 

vzr  =  0.3991 

urz  =  0.3911 

Elastic  Constants  for  Steel 

E  =  30.0  Mpsi 

V  *  0.3 
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where  aQ  is  the  initial  tensile  yield  stress  and  E*  is  the  tangent  modulus  in 
the  plastic  range  of  the  stress-strain  curve. 

Using  Eqs.  (11)  and  (13)  and  the  requirement  of  displacement  continuity  at 
the  interface,  i.e.,  u*,.  (liner)  =  u^*  (jacket),  we  obtain  the  expression  for 
the  interface  pressure  q  as 
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(23) 


g_  _ _ iicHlieii!?! _ 

(l«v)(l-2v)  *  6«x22k  -  «12l 

Given  any  value  of  p  in  a  <  p  $  b,  we  can  now  determine  q,  u,  and  all  the 
stresses  and  strains  in  the  tube.  In  particular,  the  expressions  for  the  inter¬ 
nal  pressure  the  displacements  at  the  bore  and  the  interface  are 


q-  +  \  U 

<3q  2 


{£)  +  d-n^)in  e  +  1  ne 


E  “a 


(l-u-2v2)  e-  +  ( 1-v2 )p2/a2 
ao 


£_  ^ 
<*o  &~ 


(1-W*)  -  (l-v-2v2)  9- 

b2  a0 


(24) 

(25) 

(26) 


By  letting  p  -  a  and  b,  we  can  determine  the  lower  limits  p*,  q*,  ua*,  u^*, 
uc*,  and  the  upper  limits  p**,  q**,  ua**,  u^**,  uc**,  respectively. 

FULLY-PLASTIC  RANGE.  When  the  internal  pressure  p  is  further  increased, 
i.e.,  p  >  p**,  the  steel  liner  will  become  fully-plastic.  The  composite  jacket 
remains  elastic  as  long  as  the  failure  pressure  is  not  reached.  In  this  sec¬ 
tion,  a  fully-plastic  solution  is  derived  here. 

Subject  to  <j 0  ?  az  1  <Jr,  Treses' s  criterion  states  that  yielding  occurs 

when 

Oq  -  ar  *  a  (27) 

where  a  is  the  yield  stress.  For  a  linear  strain-hardening  material, 

a  =  <70(i+neP)  (23) 

where  aQ,  q,  ep ,  are  the  initial  yield  stress,  hardening  parameter,  and  equiva¬ 
lent  plastic  strain,  respectively.  The  associated  flow  rule  states  that,  sub¬ 
ject  to  o$  >  az  >  <7 r , 

d€gP  =  -derp  »  0  and  dezp  =  0  (29) 

dezp  is  an  increment  of  plastic  strain  and  is  defined  by  dezp  =  de,  -  de-^. 

Since  dezp  =  0,  dez  =  deze,  and  therefore 

ez  =  eze  =  (az-vor-vo0)/E  (30) 

In  the  plane-strain  case  (ez  =  0)  and  using  the  equation  of  equilibrium, 

Oq  s  ar  *  -a-'  and  pr'  =  dOr/dr  (31) 
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we  have 


<J7  =  2vcr  +  vn7r' 


Since  the  dilatation  is  purely  elastic 


u'  +  u/r  +  e2  =  E_1  (l-2u)  (aP+ag+<7z) 


Substituting  from  Eqs.  (31)  and  (32) 


On  integration. 


where 


u'  +  u/r  =  E-1  (l-2v)  (1+v)  (2ar+rcrr'  ) 


ru  =  E-1  (l-2v)  (l+v)r2ar  +  <p  b2 


<fi  =  ub/b  +  (l-2v) (l+v)E_1q 

Using  Hooke's  law  and  Eqs.  (27),  (28),  (31),  and  (32),  we  obtain 

Eege  =  (l-2u)  (l+u)or  +  (l-v*)cr0(l+nep) 

0 

Substituting  from  Eq.  (35)  for  eg  and  from  Eq.  (37)  for  eg 

e9p  =  e9  -  €ge  =  0  bVr*  -  E-Ml-v2)<70(l+niP) 

3y  Eq.  (29)  and  the  definition  of  equivalent  plastic  strain. 


ep  =  / dep 


r— 

=  J\  /  ( (de0P)«  .  (d€rP) 


P,.|X  .  A  c9P 
Y2 


Combining  Eqs.  (38)  and  (39)  leads  to 


iP  s  [0  b*/r2  -  (l-i>*  )cr0/E]/[l  +  --  (l-^2)ha0/E] 

/3  /3 

Substituting  Eqs,  (27)  and  (28)  into  Eq.  (31)  and  integrating,  we  have 


or  =  -p  +  <70  in(r/a)  +  oQ  n  J  epr"'  dr 


Now  using  Eq.  (40),  an  explicit  expression  for  the  radial  stress  is  obtained 


r  1  K2  h* 

<Jr  S  -p  +  <7o(l-n0)in(-)  +  2  (I-u^T  '  F*]E<** 
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Using  Eqs.  (11)  and  (36)  and  the  requirement  of  displacement  continuity  at  the 
interface,  i.e.,  ub_  (liner)  a  ub+  (jacket),  we  get 


( c /b 1 2k  +.  i 

*  3  CEct22k  i  -  Ect12  +  (l-2v)(l+u)]q/E  (43) 

Evaluating  ar  at  the  interface  from  Eq.  (42)  and  using  Eq.  (43),  we  obtain  the 
relation  between  p  and  q 


h  ,  1  ba  (e/bl2k  +  i 

P  »  <J0(l-r?|3)ln  |  +  q{l  +  \  n/S(jT  -  l)[Ak  ”™2k':  i  *  3  *  <44) 

It  is  interesting  to  point  out  that  p  is  a  linear  function  of  q.  Similarly, 

when  evaluating  u  at  the  bore  from  Eq.  (35),  we  obtain 

ua/a  =  -(l-2v)  (l+v)P/E  +  cf>  b*/a*  (45) 

which  can  also  be  expressed  as  a  linear  function  of  q  with  the  aid  of  Eqs.  (43) 
and  (44).  Since  the  relation  between  q  and  ub  is  linear  from  Eq.  (11),  p  and 
ua,  given  by  Eqs.  (44)  and  (45),  respectively,  can  be  expressed  as  linear  func¬ 
tions  of  ub. 

NUMERICAL  RESULTS.  Given  any  value  of  internal  pressure,  we  can  obtain 
numerical  results  for  the  stresses  and  strains  in  the  radial  and  tangential 
directions  and  also  for  the  displacement  at  any  radial  position  in  a  composite 
tube.  However,  only  those  values  at  the  bore,  interface,  and  outside  surface 
have  been  calculated.  The  organic  composite  material  is  considered  to  be 
elastic  until  brittle  failure  occurs  at  a  maximum  strain  of  1.3  percent.  The 
steel  is  assumed  to  be  elastic-plastic,  linear  strain-hardening  with  cr0  *  120 
Ksi,  E*  »  0.04  E,  and  <ru  (ultimate  stress)  =  140  Ksi . 

The  relations  between  bore  hoop  strain  and  internal  pressure  are  presented 
in  Figures  2  and  3.  Figure  2  shows  the  relations  for  four  tubes  of  wall  ratio 
1.321  and  Figure  3  for  three  tubes  of  wall  ratio  1.546.  The  percentage  of  com¬ 
posite  in  each  tube  is  defined  by  (c-b)/(c-a)  x  100  percent.  The  relation 
between  hoop  strain  and  internal  pressure  is  nonlinear  in  the  elastic-plastic 
range  and  the  two  limits  are  indicated  in  the  figures.  The  nonlinear  range 
becomes  smaller  as  the  percentage  of  composite  increases.  For  a  given  strain  in 
the  elastic  range,  the  steel  tube  can  resist  larger  pressure  than  the  composite 
tube.  However,  for  a  large  strain  in  the  fully-plastic  range,  the  composite 
tube  can  support  larger  pressure  than  the  steel  tube.  This  advantage  in  con¬ 
taining  higher  pressure  seems  very  attractive  for  using  composite  tubes.  It 
is  also  interesting  to  note  that  the  nonlinear  elastic-plastic  range  becomes 
larger  as  the  wall  ratio  increases  as  shown  in  these  two  figures. 

The  numerical  results  for  the  hoop  strains  at  the  bore,  interface,  and  out¬ 
side  surface  of  three  composite  tubes  are  shown  in  Figures  4,  5,  and  6  as  func¬ 
tions  of  internal  pressure.  The  actual  specimens  were  constructed  [1]  using 
steel  liners  with  two  thicknesses  and  the  appropriate  thickness  of  the  composite 
circumferentially  wound  on  the  liner.  The  geometric  dimensions  (a,b,c)  for  the 
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three  composite  tubes  are  (0.9,  1.0,  1.189),  (0.9,  1.07,  1,189),  (0.9,  1.07, 
1.391).  Figures  4,  5,  and  6  show  the  numerical  results  for  these  tubes,  respec¬ 
tively.  The  complete  (including  elastic,  elastic-plastic,  and  fully-plastic) 
ranges  of  loadings  up  to  failure  pressure  have  been  considered.  Brittle  failure 
of  the  composite  material  occurs  at  a  maximum  strain  of  1.3  percent.  The  maxi¬ 
mum  values  of  internal  pressure  these  three  tubes  can  contain  without  failure 
are  56.9,  53.1,  and  78.0  Ksi,  respectively .  In  these  figures  we  also  show  the 
limits  of  internal  pressure  in  the  elastic-plastic  range,  i.e.,  (p*,  P**)  = 
(20.48,  23.93),  (23.06,  28.75),  (27.47,  34.98  Ksi),  respectively. 
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Figure  1.  Schematic  of  a  composite  tube  problem 


Figure  2.  The  relation  between  bore  hoop  strain  and  internal  nressure 
for  four  cubes  of  wall  ratio  1.321 
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The  relation  between  bore  hoop  strain  and  internal  pressure 
for  three  tubes  of  wall  ratio  1.546 


Figure  4.  3oop  strains  at  the  bore,  interface,  and  outside  surface  as 
functions  of  internal  pressure  for  a  composite  cube  (a  *  0. 
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Figure  5.  Hoop  strains  at  the  bore,  interface,  and  outside  surface  as  functions 
of  internal  pressure  for  a  composite  tube-  (a  -0.9,  b  -  1.07,  c  -1.391) 


ULTRAFAST  THERMODYNAMIC  PROCESSES 


Richard  A.  Weiss 

U.  S.  Army  Engineer  Waterways  Experiment  Station 
Vicksburg,  Mississippi  39180 


ABSTRACT .  The  conventional  thermodynamic  description  of  a  rapid  revers¬ 
ible  process  assumes  that  the  process  is  adiabatic  and  no  heat  is  exchanged 
between  the  thermodynamic  system  and  the  environment  so  that  the  entropy  of  the 
system  remains  constant.  This  paper  suggests  the  possibility  of  processes  that 
occur  so  fast  that  the  magnitudes  of  both  the  entropy  and  internal  energy  of  the 
system  remain  constant.  For  such  a  system  there  is  an  exchange  of  heat  with  the 
environment  in  the  form  of  a  change  of  the  internal  phases  of  the  thermodynamic 
system.  The  thermodynamic  equations  for  internal  phase  changing  processes  are 
developed,  and  a  general  procedure  is  developed  for  relating  the  temperature  and 
density  for  a  system  undergoing  an  ultrafast  process.  The  magnitude  and  inter¬ 
nal  phase  angle  of  the  pressure  associated  with  an  ultrafast  thermodynamic  pro¬ 
cess  are  calculated.  The  rapid  processes  that  occur  in  supemovae  may  possibly 
be  described  by  these  calculations.  Applications  to  the  early  stages  of  chem¬ 
ical  reactions  are  suggested. 

1 .  INTRODUCTION .  Processes  that  occur  very  fast  appear  in  both  astrophy- 
sical  and  laboratory  situations.  For  example,  rapid  nuclear  processes  occur  in 
stars  before  and  during  supernova  explosions.1-5  These  include  electron  capture 
by  protons  and  the  rapid  capture  of  neutrons  by  atomic  nuclei.  In  addition  there 
are  the  processes  associated  with  the  core  bounce  and  the  subsequent  generation 
of  shock  waves.  Finally,  associated  with  stellar  core  collapse  is  the  generation 
of  neutrinos  which  interact  with  the  stellar  atmosphere  and  often  produce  pres¬ 
sures  that  are  sufficient  to  blow  off  the  atmosphere.1-5  These  processes  occur 
on  very  short  time  scales  and  the  question  of  the  adequacy  of  the  adiabatic  as¬ 
sumption  of  ordinary  thermodynamics  arises  because  the  adiabatic  process  re¬ 
quires  the  internal  energy  to  change  and  this  may  occur  on  a  slower  time  scale 
than  the  short  time  scale  of  the  physical  process  itself. 

The  description  of  the  interaction  of  gravity  waves  with  matter,  as  in  the 
case  of  a  laboratory  gravitational  wave  detector,  needs  to  account  for  the  rapid 
distortion  of  atoms  and  molecules  due  to  the  rapid  change  of  the  curvature  of 
spacetime.0’7  A  description  of  these  ultrafast  gravity  wave  interactions  re¬ 
quires  a  description  of  a  state  equation  for  matter  which  includes  parameters 
that  determine  the  effects  of  gravity  waves  on  the  atomic  structure  of  matter. 
Such  a  state  equation  has  been  developed  for  .the  real  gases.8  Again  the  ques¬ 
tion  arises  as  to  whether  the  adiabatic  assumption  is  a  valid  description  of  the 
interaction  of  gravity  waves  with  matter  or  whether  something  more  sophisticated 
is  required  to  describe  this  extremely  fast  process. 

Rapid  processes  also  occur  in  more  conventional  laboratory  experiments. 
Consider  the  actual  processes  that  occur  during  chemical  reactions  such  as  chem¬ 
ical  bond  breaking  and  formation  which  may  occur  on  the  femtosecond  time  scale . 9-1 1 
Another  example  of  an  ultrafast  process  that  may  require  reinterpretation  is  the 
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case  of  subpicosecond  laser  pulses  interacting  with  matter. 12-1 4  A  better  un¬ 
derstanding  of  the  state  equations  of  matter  and  of  ultrafast  thermodynamic  pro¬ 
cesses  are  needed  to  describe  these  physical  processes. 


A  theory  of  relativistic  thermodynamic  state  equations  has  been  developed 
in  order  to  account  for  a  difficulty  with  the  state  equation  of  matter  at  high 
densities,  namely  the  fact  that  the  state  equation  is  not  nearly  as  stiff  as  is 
predicted  by  conventional  calculations.15  The  four  dimensional  Minkowski  space- 
time  of  special  relativity  was  introduced  through  the  development  of  a  relativ¬ 
istic  trace  equation,  and  specific  solutions  of  this  equation  were  developed  for 
solids,  quantum  liquids  and  the  real  classical  gases. 1S> 16  In  order  to  have  the 
Lie  group  e*^  (and  e-$)  as  the  gauge  group  of  relativistic  thermodynamics,  the 
concept  of  thermodynamic  variables  with  internal  phase  angles  was  introduced -1 7  ’ 18 

The  trace  equation  for  completely  symmetric  matter  is  given  by15 

0  +  T(dU/dT)  -  3V  d/dV (PV)  -  U*  +  T(dUa/dT)  a  (1) 

PV  U  PaV 

where  U  and  P  *  relativistic  internal  energy  and  pressure  respectively,  U3  and 
Pa  ■  unrenormalized  energy  and  pressure  respectively,  and  T  and  V  =  temperature 
and  volume  respectively.  The  trace  equation  for  matter  whose  thermodynamic 
functions  have  broken  symmetries  is  given  by 

U  +  T(dU/dT)_  -  3Vd/dV(PV)_  *  U*  +  T(dUa/dT)  ,  (2) 

PV  U  PaV 


where  U  and  P  *  complex  number  representations  of  the  renormalized  internal  en¬ 
ergy  and  pressure  respectively.  Equation  (2)  can  be  further  simplified  by  using 
the  following  form  of  the  Gibbs-Helmholtz-Maxwell  equation18 


3U/3V  =*  T(3P/3T)V  -  P  (3) 

The  complex  numbers  U  and  P  that  appear  in  equations  (2)  and  (3)  are  written  as: 
U  =  Uej9U  (4) 

P  =  Pej9p  (5) 


where  U  ,  P  ,  0y  and  0p  are  obtained  from  a  solution  of  equations  (2)  and  (3). 
In  a  similar  fashion  the  complex  number  entropy  is  written  as18 

S  =  SeJ9s  (6) 


As  an  illustrative  example  of  the  use  of  equations  (1)  and  (2)  they  can  be  ap¬ 
plied  to  real  gases. 

The  pressure  of  an  ordinary  real  gas  is  written  as19 

Pa  =  nRaT(l  +  Ban  +  Can2  +  •••)  (7) 

where  n  =*  reciprocal  volume,  and  Ra  ,  Ba  and  Ca  =  ordinary  gas  constant,  second 
virial  coefficient  and  third  virial  coefficient  respectively.  The  corresponding 
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(8) 


pressure  for  a  symmetric  relativistic  real  gas  is  written  as 


a ,  1  s 


nRT(l  +  Bn  +  Cn  + 


where  R,  B  and  C  ■  corresponding  relativistic  gas  constant,  second  virial  coef¬ 
ficient  and  third  virial  coefficient  respectively  which  are  given  by8*1 5 


C  =■  Ca  -  3B2  In  ipa 
a 


where 


* 


a 


Ba(T) 

B3 (TR) 


2/3 


(9) 

(10) 

(ID 

(12) 


where  TR  ■  species  dependent  relativity  temperature  constant.  For  the  case  of  a 
relativistic  real  gas  with  broken  internal  symmetry  the  pressure  is  written  as17 


P  -  nRT(l  +  Bn  +  Cn2  +  •••) 


(13) 


where  C  is  obtained  from  a  solution  of  equation  (2)  and  is  given  by17 

C  *  C  -  3B 2  In  ij>a  ej9f  (14) 

3  3. 

where  9f  is  given  by  the  solution  of  a  set  of  coupled  differential  equations.*7 
Ultrafast  thermodynamic  processes,  for  which  both  the  entropy  and  the  magnitude 
of  the  internal  energy  are  fixed,  are  only  possible  in  systems  like  the  real 
gases  that  have  a  parameter  TR  which  varies  during  the  process.  The  parameter 
Tr  depends  on  the  species  of  atoms  in  the  gas  and  therefore  TR  changes  for  pro¬ 
cesses  that  alter  the  composition  of  the  gas. 


This  paper  considers  thermodynamic  processes  that  are  sufficiently  rapid 
to  keep  both  the  entropy  and  the  magnitude  of  the  internal  energy  constant  or  to 
keep  the  magnitudes  of  both  the  entropy  and  the  internal  energy  fixed.  Such  pro¬ 
cesses  change  the  internal  phases  of  the  entropy  and  internal  energy,  and  the 
entropy  and  internal  energy  vectors  essentially  rotate  (in  internal  space)  but 
do  not  stretch.  This  is  a  special  case  of  the  general  situation  where  the  com¬ 
plex  number  thermodynamic  functions  rotate  and  stretch  during  thermodynamic 
processes.18  A  theory  of  ultrafast  thermodynamic  processes  is  developed  and  an 
expression  for  the  pressure  associated  with  these  processes  is  derived. 

2.  ULTRAFAST  THERMODYNAMIC  PROCESSES.  This  section  considers  the  thermo¬ 
dynamic  equations  that  describe  ultrafast  processes  occuring  in  matter  that  has 
internal  phase  angles  associated  with  the  thermodynamic  functions.  The  general 
thermodynamic  equations  of  matter  and  radiation  with  internal  phase  have  already 
been  developed  in  the  literature.17’18  The  expression  for  the  first  law  of 
thermodynamics  for  matter  with  internal  phase  is  written  as18 
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'dQ  -  TdS  -  dU  +  PdV  +  £  Madct  (15) 

where  Q  ■  complex  number  heat,  Mq  ■  set  of  generalized  complex  number  forces  and 

a  «  set  of  generalized  extensive  variables. 

For  the  special  case  where  the  change  in  entropy  and  internal  energy  are  of 
the  form  of  rotations  it  follows  from  equations  (4)  and  (6)  that18 

dQ  -  TdS  -  jTSd9s  (16) 

dU  -  jUdSy  (17) 

which  combined  with  equation  (15)  gives  the  following  result  for  an  ultrafast 
process  where  S  and  U  are  both  constant 

jTSde  -  jUd0  .  +  PdV  +  l  M  da  (18) 

O  U  ct 

The  generalized  force  can  be  written  as 

M  ■  M  ej0Mot  (19) 

a  a 

The  real  and  imaginary  parts  of  equation  (18)  can  be  written  as 

-  TS  sin  0  d0  *  -  U  sin  0  d8.  +  P  cos  9_  dV  +  7  M  cos  0M  da  (20) 

SS  UU  P^aMa 

+  TS  cos  0  d0  *  +  U  cos  0TT  d9„  +  P  sin  0_.  dV  +  >  M  sin  0U  da  (21) 

SS  UU  P“aMa 

For  the  special  case  of  the  relativistic  real  gas  the  generalized  extensive 
variable  is  a  »  TR  and  the  generalized  force  is  Mj  *  SR  ,  where  Sr  is  the  complex 
number  generalization  of  the  scalar  parameter  Sr  that  appears  in  Reference  8. 

For  the  real  gas  equation  (18)  becomes 


jTSd0s  -  jUdOy  +  PdV  +  SRdTR 


where 


SR  -  SReJ^*  (23) 

is  the  following  complex  number  generalization  of  the  scalar  result  in  Reference  8 
SR  -  -  l/2MRTn2(3C/3TR)T  (24) 


where 


1/2 

SR=-l/2NRTn2[(3C/3TR)2  +  (C30C/3TR)2] 


tan  0 


c39g/9tr 

SR  “  SC/ST^ 
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and  where  N  *  number  of  moles.  The  real  and  imaginary  parts  of  equation  (22) 
can  be  written  as 


-  TS  sin 

9s 

d9s 

-  -  u 

sin 

9u 

d9u 

+ 

p 

cos 

9P 

dV 

+ 

SR 

cos 

9sr 

dTR 

(27) 

+  TS  cos 

9s 

d9s 

-  +  u 

cos 

9u 

d9u 

+ 

p 

sin 

9P 

dV 

+ 

SR 

sin 

9sr 

dTR 

(28) 

which  are  the  thermodynamic  equations  for  an  ultrafast  process  in  a  real  gas 
with  both  S  and  U  held  constant.  If  the  process  is  truly  adiabatic  with  S  = 
constant  and  dS  =»  0  and  d9g  ■  0  then  the  left  hand  sides  of  equations  (27)  and 
(28)  must  be  set  equal  to  zero.  Note  that  in  general  U  =  U(T,V,TR)  , 

0U  =  9u(T,V,Tr)  ,  S  =  S(T,V,TR)  and  0S  =  9S(T,V,TR)  . 

In  order  to  utilize  equations  (20)  and  (21)  it  is  necessary  to  evaluate  the 
differentials  d9<j  and  d9y  for  the  case  of  an  ultrafast  process  with  both  U  and  S 
held  constant.  These  are  obtained  from  the  following  total  derivatives 

(d0  /da)  -  30s/3a  +  30s/3V(dV/da);]  +  39s/3T(dT/da)u>s  (29) 

(d0u/da)u>s  -  390/3o  +  30u/3V(dV/da)u  g  +  30lJ/3T(dT/da)u^s  (30) 

where  (dV/dcOy^g  and  (dT/doOy^g  are  obtained  from  the  following  two  conditions 
which  state  that  S  and  U  are  constant 


3S/'3ct  +  3S/3V(dV/da) 

U  f  o 

3U/3a  +  3U/3V (dV/da) 

U  )  j 


+  3S/3T(dT/da)[J  g  -  0 

(31) 

+  3U/3T(dT/da)u  g  =  0 

(32) 

In  general  S  =  S(a,V,T)  ,  0g  *  9g(a, V,T)  ,  U  =  U(a,V,T)  and  9y  =  0y(a,V,T)  .  From 
equations  (31)  and  (32)  it  follows  that 


(dV/da)u>s 


(3U/3T) (3S/3a)  -  (3S/3T) (3U/3a) 
(3S/3T) (3U/3V)  -  (3U/3T) (3S/3V) 


(33) 


(dT/da)ujS 


(3S/3V) (3U/3a)  -  (3U/3V) (3S/3a) 
(3S/3T)  (.3U/3V)  -  (3U/3T)  (3S/3V) 


(34) 


Eliminating  da  from  equations  (33)  and  (34)  gives 


<dT/dV>U,S 


(3S/3V) (3U/3a)  -  (3U/3V) (3S/3a) 
(3U/3T) (3S/3a)  -  (3S/3T) (3U/3a) 


(35) 


Equation  (35)  relates  T  and  V  for  the  case  of  constant  U  and  S  .  Only  if 
3U/3a  #  0  and  3S/3a  ^  0  are  T  and  V  related.  The  derivative  of  the  temperature 
with  respect  to  the  reciprocal  volume  at  constant  U  and  S  is  given  by 

n(dT/dn)u>s  =  -  V(dT/dV)u>s  (36) 
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If  °  *  a(V,T)  is  calculated  from  the  condition  U(a,T,V)  *  constant  then  the  con¬ 
stant  S  condition  can  be  written  as 


3S/3V  +  3S/3T(dT/dV)  _  -  0 

Ui  O 


(37) 


3S/3T  +  3S/3V(dV/dT)  -  0 

U  )  o 


(38) 


Similarly  if  a  *  ct(V,T)  is  eliminated  by  the  condition  S(ct,V,T)  =  constant  then 
the  constant  U  condition  can  be  written  as 


3U/3V  +  3U/3T(dT/dV)  =0  (39) 

3U/3T  +  3U/3V(dV/dT)l]  g  -  0  (40) 

Neglecting  the  da  term  in  equations  (20)  and  (21)  gives 

-  TS  sin  9S  d6s  ^  -  U  sin  By  dey  +  P  cos  6p  dV  (41) 

+  TS  cos  0S  d0g  ^  +  U  cos  9y  d9y  +  P  sin  0p  dV  (42) 

Then 

-  P  cos  9p  %  TS  sin  Qy  (dOg/dV)^  -  U  sin  0y  (d6y/dV)u  g  (43) 

P  sin  9p  %  TS  cos  9S  (d6s/dV)u >g  -  U  cos  0y  (dOy/dV^  g  (44) 

Now  assume  that  9<,  'v  9^  in  the  trigonometric  terms 

-  P  cos  9p  %  sin  0y  [TS(d0s/dV)u>s  -  U(d0y/dV)y  g]  (45) 

P  sin  0p  cos  0y  [T3(d9s/dV)UjS  -  U(d9y/dV)y  g]  (46) 

From  equations  (45)  and  (46)  it  follows  that 

tan  9p  4/  -  tan  9y  (47) 

9p  ^  9u  +  77/2  (48) 

cos  9p  'v  -  sin  9y  (49) 

sin  9p  -v  cos  9y  (50) 

Combining  equations  (45)  through  (50)  gives 

P  TS(d9s/dV)UiS  -  IHdtydV)  (51) 
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where 


(d0s/dV)UjS  -  30S/3V  +  30g/3T(dT/dV)u>s  +  30s/3a(da/dV)y>s  (52) 

(d0y/dV)y^  -  30y/3V  +  3 6 y / 3 T ( dT / d V ) y ^ s  +  39y/3a(da/dV)y>s  (53) 

where  (dT/dV)^^  is  given  by  equation  (35)  and  (da/dV)u,S  is  given  by  equation 

(33).  Equations  (48)  and  (51)  give  the  pressure  associated  with  an  ultrafast 
thermodynamic  process  that  has  both  U  and  S  held  constant. 

The  magnitude  of  the  pressure  given  by  equation  (51)  can  be  written  in 
terms  of  the  reciprocal  volume  n  =  1/V  as  follows 

P  *  Un2(d0y/dn)u>s  -  TSn2(d0s/dn)u>s  (54) 

-  En(d0y/dn)u>s  -  TSn(d0s/dn)y>s 

where  the  energy  density  E  and  the  entropy  density  $  are  given  by 

E  =  U/V  »  nU  (55) 

S  =  S/V  =*  nS  (56) 

where  U  and  S  are  constants  in  this  paper.  A  further  approximation  for  the  pres¬ 
sure  can  be  obtained  by  taking  0g  ^  8y  in  equations  (51)  and  (54)  with  the  re¬ 
sult 

P  *  (TS  -  U)(d0y/dV)UjS  (57) 

=  (E  -  T^)n(d0y/dn)y  g 

Equation  (57)  has  a  proper  T  *  0  limit. 

Equation  (54)  is  not  an  equation  of  state  but  rather  gives  the  pressure  for 
a  thermodynamic  process  for  which  U  and  S  are  constants.  Therefore  P  =  P(T)  or 
P  »  P(n)  because  V  and  T  are  related  by  equation  (35).  The  total  derivative 
dP/dn  can  be  calculated  from  equation  (54)  as  follows 

(dp/dn)u,s  ■  “i><dVdnVs  +  "2<d2VdI,2)u,s]  (58) 

-  TS[2n(d6s/dn)UjS  +  n2 (d^/dn2)^ g] 

'  s<dT'd">u,s  n2(dVdn)u,s 

Similarly 

<dP/dT>U,S  *  <dP/dn)U,5<dt,/dT)U,S  (59) 
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where  (dn/dT)^^  is  given  by  equations  (35)  and  (36). 

Consider  now  the  case  of  an  ultrafast  adiabatic  process  in  which  the  en¬ 
tropy  S  and  U  remains  fixed.  For  this  case  dS  -  0  ,  d9q  -  0  and  dU  -  0  .  Then 


equation  (18)  gives 

jUdQy  +  PdV  +  l  MQda  -  0  (60) 

Neglecting  da  gives 

P  -  -  lKd8D/dV)Ui§  W2t,n)  (61) 

p  -  -  u<de0/dV)M  (62) 

9p  ■  */2  +  9u  (63) 


For  this  case  0jj  ^  0g  because  0g  =  constant.  The  derivative  in  equation  (62)  is 
obtained  from  equations  (33) ,  (35)  and  (53) .  For  this  case  two  parameters  a  and 
6  are  required  in  equation  (60)  to  evaluate  the  derivative  in  equation  (62) . 

For  the  case  of  broken  symmetry  of  space  the  first  law  of  thermodynamics  is 
written  as 


dQ  =■  dU  +  P|dV| 

(64) 

| dV|  -  sec  SVjV  dV 

(65) 

tan  Bv  v  =  V36V/3V 

(66) 

From  equations  (64)  and  (65)  it  follows  that  in  order  to  obtain  the  basic  equa¬ 
tions  of  thermodynamics  for  broken  symmetry  space  the  substitution  P-*-P  sec  By  v 
is  made  in  the  basic  thermodynamics  equations  such  as  those  given  in  Reference 
18.  For  instance,  the  trace  equation  (2)  becomes 

U  +  T(dU/dT)_  -  3Vd/dV(PV  sec  gTT  _)_  -  U3  +  T(dUa/dT)  (67) 

PVsecBv>v  v’v  U  PaV 

while  equation  (3)  becomes 

3U/3V  -  T3/3T(P  sec  By  y)  -  P  sec  Bv  v  (68) 

*  (T3P/3T  -  P)sec  8y  v  +  PT3/3T(sec  By  y) 

For  30y/3T  *  0  equation  (68)  becomes 

cos  Sy  v  3U/3V  *  T3P/3T  -  P  (69) 
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For  T  ■  0  equation  (68)  or  (69)  gives 

P  -  -  cos  8V>V  3U/8V  (70) 

In  general  the  broken  symmetry  of  space  lowers  the  calculated  thermodynamic  pres¬ 
sure.  For  example  the  broken  internal  symmetry  of  space  requires  equation  (13) 
for  the  pressure  of  real  gases  with  broken  internal  symmetry  to  be  written  as 

P  -  nRT  cos  8V  (1  +  Bn  +  Cn2  +  ♦••)  (71) 

v » v 

For  the  case  of  an  ultrafast  thermodynamic  process  equation  (64)  becomes 
jTSd6s  -  jUdSy  +  P  sec  Sv>v  dV  (72) 

and  equations  (57)  and  (62)  become  respectively 

P  ^  cos  (£  -  T3)n(deu/dn)u>s  (73) 

P  -  -  “«  Sv,v  0<dVdV)U,S  (74) 

For  radial  symmetry  tfy  y  »  8r  r  where  r  =  radial  coordinate.  Because  6r>r  is 
related  to  the  internal  phase  angle  8r  of  the  radial  coordinate  it  follows  that 
the  laws  of  thermodynamics  such  as  equations  (67)  through  (74)  depend  on  the  in¬ 
ternal  phase  structure  of  space.  But  the  value  of  0r  on  a  macroscopic  scale 
depends  on  gravity.  For  instance  for  the  earth's  surface  9r  -  5.7°.  There¬ 
fore  the  calculations  of  thermodynamics  must  include  the  effects  of  gravity. 

3.  CONCLUSION.  For  systems,  such  as  the  real  gas,  with  broken  internal 
symmetries  in  the  pressure,  internal  energy  and  entropy  it  is  possible  to  have  a 
thermodynamic  process  that  occurs  so  fast  as  to  keep  the  magnitudes  of  the  en¬ 
tropy  and  internal  energy  constant.  This  is  possible  only  for  systems  like  the 
real  gases  which  have  a  parameter  (like  the  relativity  temperature  T^)  that 
changes  during  the  process.  Such  a  process  involves  a  change  of  structure  of 
the  molecules  or  atoms  of  the  system  as  in  the  case  of  chemical  or  nu«_lear 
reactions . 


ACKNOWLEDGEMENT 

The  author  wishes  to  thank  Elizabeth  K.  Klein  for  typing  this  paper. 

REFERENCES 


1.  Trimble,  V.,  "Supernovae.  Part  I:  The  Events,"  Rev.  Mod.  Phys.,  Vol.  54, 

No.  4,  Oct.  1982,  p.  1183. 

2.  Fowler,  W.  A.,  "Experimental  and  Theoretical  Astrophysics:  The  Quest  for 
the  Origin  of  the  Elements,"  Rev.  Mod.  Phys.,  Vol.  56,  No.  2,  Apr.  1984,  p.  149. 

3.  Woosley,  S.  E.  and  Phillips,  M.  M. ,  "Supernova  1987A!,"  Science,  Vol.  240, 
May  6,  1988,  p.  750. 


607 


4.  Bethe,  H.  A.,  "Nuclear  Physics  Needed  for  the  Theory  of  Supernovae , "  Ann. 

Rev.  Nucl.  Part.  Sci.,  Vol.  38,  1988,  p.  1. 

5.  Trimble,  V.,  "1987A:  The  Greatest  Supernova  Since  Kepler,"  Rev.  Mod.  Phys., 
Vol.  60,  No.  4,  Oct.  1988,  p.  859. 

6.  Thorne,  K.  S.,  "Gravitational  Wave  Research:  Current  Status  and  Future 
Prospects,"  Rev.  Mod.  Phys.,  Vol.  52,  No.  2,  Part  1,  April  1980. 

7.  Press,  W.  H.  and  Thorne,  K.  S.,  "Gravitational  Wave  Astronomy,"  Ann.  Rev. 
Astron.  Astrophys.,  Vol.  10,  1972,  p.  335. 

8.  Weiss,  R.  A.,  "Relativistic  Wave  Equations  for  Real  Gases,"  Fourth  Army 
Conference  on  Applied  Mathematics  and  Computing,  Cornell  University,  Ithaca, 

ARO  87-1,  May  27-30,  1986,  p.  341. 

9.  Zewail,  A.,  "Laser  Femtochemistry ,"  Science,  Vol.  242,  23  Dec.  1988,  p.  1645. 

10.  Peters,  K.  S.  and  Snyder,  G.  J.,  "Time-Resolved  Photoacoustic  Calorimetry: 
Probing  the  Energetics  and  Dynamics  of  Fast  Chemical  and  Biochemical  Reactions," 
Science,  Vol.  241,  26  Aug.  1988,  p.  1053. 

11.  Davis,  W.  C.,  "The  Detonation  of  Explosives,"  Scientific  American,  Vol.  256, 
No.  5,  May  1987,  p.  106. 

12.  Yablonovitch ,  E.,  "Energy  Conservation  in  the  Picosecond  and  Subpicosecond 
Photoelectric  Effect,"  Phys.  Rev.  Lett.,  Vol.  60,  No.  9,  29  Feb.  1988,  p.  795. 

13.  Hicks,  J.  M. ,  Urbach,  L.  E. ,  Plummer,  E.  W.  and  Dai,  H.  L.,  "Can  Pulsed 
Laser  Excitation  of  Surfaces  be  Described  by  a  Thermal  Model?,"  Phys.  Rev.  Lett., 
Vol.  61,  28  Nov.  1988,  p.  2588. 

14.  Murnane,  M.  M. ,  Kapteyn,  H.  C.  and  Falcone,  R.  W. ,  "High-Density  Plasmas 
Produced  by  Ultrafast  Laser  Pulses,"  Phys.  Rev.  Lett.,  Vol.  62,  9  Jan.  1989, 
p.  155. 

15.  Weiss,  R.  A.,  Relativistic  Thermodynamics,  Vols.  1  and  2,  Exposition  Press, 
New  York,  1976. 

16.  Weiss,  R.  A.,  "Relativistic  Wave  Equations  for  Solids  and  Low  Temperature 
Quantum  Systems,"  Third  Army  Conference  on  Applied  Mathematics  and  Computing, 
Georgia  Institute  of  Technology,  ARO  86-1,  May  13-16,  1985,  p.  717. 

17.  Weiss,  R.  A.,  "Relativistic  Thermodynamics  of  Real  Gases  with  Broken  In¬ 
ternal  Symmetry,"  Sixth  Army  Conference  on  Applied  Mathematics  and  Computing, 
Univ.  of  Colorado,  Boulder,  ARO  89-1,  31  May-3  June,  1988,  p.  203. 

18.  Weiss,  R.  A.,  "Thermodynamic  Gauge  Theory  of  Solids  and  Quantum  Liquids 
with  Internal  Phase,"  Fifth  Army  Conference  on  Applied  Mathematics  and  Comput¬ 
ing,  West  Point,  New  York  ARO  88-1,  June  15-18,  1987,  p.  649. 

19.  Hirschf elder ,  J.  0.,  Curtiss,  C.  F,  and  Bird,  R.  B.,  Molecular  Theory  of 
Gases  and  Liquids,  John  Wiley,  New  York,  1954. 


608 


THE  INTERNAL  PHASE  STRUCTURE  OF  ATOMS 


Richard  A.  Weiss 

U.  S.  Army  Engineer  Waterways  Experiment  Station 
Vicksburg,  Mississippi  39180 


ABSTRACT .  The  three  dimensional  SchrBdinger  equation  for  hydrogen-like  at¬ 
oms  under  pressure  is  solved  and  the  spectra  and  eigenfunctions  are  calculated 
using  the  fact  that  in  a  pressure  field  the  coordinates  have  internal  phase  an¬ 
gles.  Because  the  coordinates  have  broken  internal  symmetries  the  energy  eigen¬ 
values  are  complex  numbers  whose  real  parts  yield  the  measurable  quantities  that 
can  be  experimentally  tested  by  examining  the  spectra  of  one-electron  atoms  under 
pressure.  The  magnetic,  azimuthal  and  principal  quantum  numbers  must  be  repre¬ 
sented  as  complex  numbers  for  hydrogen-like  atoms  under  pressure.  It  is  found 
that  under  pressure  hydrogen-like  atoms  will  exhibit  a  pressure  dependent  fine 
structure  in  which  the  energy  levels  of  the  valence  electron  depend  on  the  mag¬ 
netic  quantum  number  as  well  as  on  the  principal  quantum  number.  The  pressure 
dependence  of  the  spectra  of  hydrogen-like  atoms  is  determined.  This  research 
will  have  applications  to  stellar  atmospheres  and  to  gases  at  high  pressures  as¬ 
sociated  with  conventional  and  nuclear  explosions. 

1 .  INTRODUCTION .  The  early  development  of  atomic  and  nuclear  physics  made 
minimal  use  of  gauge  field  theory  because  the  only  gauge  field  known  was  electro¬ 
magnetism.1  The  importance  of  gauge  fields  was  only  fully  realized  in  the  past 
twenty-five  years  from  a  search  for  a  unifying  principle  behind  the  four  funda¬ 
mental  interactions.2  Gauge  theories  are  now  important  in  many  scientific  and 
mathematics  disciplines.3’  Recently  a  gauge  theory  of  relativistic  thermodyn¬ 
amics  has  been  developed  which  suggests  that  the  pressure  in  a  bulk  matter  sys¬ 
tem  has  a  broken  internal  symmetry  and  must  be  represented  by  a  complex  number  .5  ’  8 
A  set  of  renormalization  group  equations  has  been  developed  which  gives  the  rec¬ 
ipe  for  calculating  the  magnitude  and  internal  phase  angle  of  the  pressure  as  a 
function  of  temperature  and  density  for  an  interacting  bulk  matter  system. 5,5 
This  can  be  applied  to  gases,  liquids  or  solids. 

If  the  pressure  has  a  broken  internal  symmetry  then  Euler's  equations  of 
motion  suggest  that  space  and  time  coordinates  must  also  have  broken  internal 
symmetries  and  be  treated  as  complex  numbers.7  Arguments  from  string  theory  al¬ 
so  predict  a  complex  number  coordinate  representation. 8  The  complex  number  val¬ 
ues  of  the  space  and  time  coordinates  imply  that  the  basic  wave  equations  of 
classical  and  quantum  physics  must  also  contain  these  broken  internal  coordinate 
symmetries.  For  instance  the  SchrBdinger  and  Dirac  equations  must  be  written  as 
complex  number  coordinate  equations  whose  eigenvalues  and  eigenfunctions  have 
broken  internal  symmetries.8  Therefore,  indirectly,  a  gauge  theory  of  relativis¬ 
tic  thermodynamics  predicts  microscopic  effects  which  affect  the  basic  calcula¬ 
tions  of  atomic  physics  and  the  structure  of  atoms.  In  fact,  atoms  located  in 
a  pressure  field  should  exhibit  an  internal  phase  structure  which  depends  on  the 
magnitude  of  the  ambient  pressure. 

The  Bohr  atom  under  zero  external  forces  does  not  exhibit  an  internal  phase 
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Atoms  subjected  to  pressure  or  other  external  forces  have  an  internal 
phase  structure.  This  can  be  seen  in  a  simple  fashion  by  noting  that  for  the 
case  of  a  Bohr  electron  under  the  influence  of  an  external  radial  force  equation 
(2)  can  be  written  as 

uw2r  -  Ze2/?2  +  F  (13) 

where  F  *  FeJ  F  ■  complex  number  axternal  force  acting  on  the  electron.  The 
external  force  can  be  transmitted  to  the  electron  by  electrical  forces  from  ad- 
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jacent  atoms  and  is  utlimately  related  to  the  complex  number  pressure  which  can 
be  written  as 


p  -  Pej0p 


(14) 


where  9p  "  internal  phase  angle  of  the  pressure.  A  more  sophisticated  approach 
to  the  problem  of  the  Bohr  atom  in  a  pressure  field  is  to  solve  SchrSdinger ' s 
equation  for  this  case.  When  this  is  done  it  is  found  that  the  presence  of  a 
broken  symmetry  pressure  requires  that  the  principal  quantum  number  that  appears 
in  the  Bohr  quantization  equation  (1)  must  be  a  complex  number  so  that 


ur  SI  *  n/t 


(15) 


n 


ne- 


j0n 


(16) 


where  n  *  complex  number  principal  quantum  number  which  is  related  to  the  broken 
symmetry  of  the  azimuthal  angle  and  ultimately  to  the  complex  number  pressure 
(Section  5) .  For  a  hydrogen-like  atom  under  pressure  the  quantization  condition 
in  equation  (15)  yields 


2 

yr  u) 


n  h 


(17) 


20r  +  9.  "  9n  (L8) 

while  the  force  balance  equation  (13)  gives 

uw2r  cos(29  +  9  )  *  Ze2/r2  cos(20  )  +  F  cos  0  (19) 

U)  r  r  t 

uoj2r  sin (29^  +  0^)  «  -  Ze2/r2  sin(20r)  +  F  sin  ep  (20) 

Equations  (17)  through  (20)  are  four  scalar  equations  which  can  be  solved  for  r  , 
9r  ,  u i  and  9U  in  terms  of  n  .  9n  ,  F  and  0p  or  ultimately  in  terms  of  P  and  9p  . 
For  F  j4  0  the  solution  of  the  SchrSdinger  equation  becomes  extremely  difficult 
and  therefore  this  is  not  considered  in  this  paper.  Instead  it  is  assumed  that 
F  ^  0  so  that  0r  and  0U  are  obtained  from  equations  (8)  and  (18)  to  be 


9  ^  29  9  ^  -  39 

r  n  <jj  n 

and  the  corresponding  SchrSdinger  equation  can  also  be  solved. 


(20A) 


This  paper  considers  the  solution  of  SchrSdinger ' s  equation  for  hydrogen¬ 
like  atoms  in  a  three  dimensional  space  that  exhibits  broken  internal  symmetry. 
Section  2  gives  the  general  form  of  SchrSdinger ' s  equation  for  a  spherically  sym¬ 
metric  potential  with  broken  internal  symmetries.  Section  3  considers  the  azi¬ 
muthal  angle  equation  and  introduces  the  complex  magnetic  quantum  number.  Section 
4  treats  the  zenith  angle  equation  which  introduces  the  complex  azimuthal  quan¬ 
tum  number,  while  Section  5  solves  the  radial  equation  which  has  a  complex  value 
of  the  principal  quantum  number.  Section  6  presents  the  complex  number  wave 
functions  for  hydrogen-like  atoms  under  pressure  and  Section  7  determines  the 
complex  number  energy  eigenvalues.  Only  bound  states  are  considered  in  this  pa¬ 
per. 


fill 


2.  schrQdinger's  equation  for  hydrogen  with  broken  internal  symmetries. 

The  three  dimensional  Schrbdinger  equation  in  spherical  polar  coordinates  for  an 
electron  in  a  central  force  field  in  an  atom  whose  space  and  time  coordinates 
have  broken  internal  symmetries  is  written  as  a  generalization  of  the  standard 
form  of  this  equation  as  follows12-14 


a2?  2  3?  _ 1_ 

. -2  r  3r-2  ,  - 
3r  r  sin  p 


.  7  . 

7  W'sl”  *af)  +3 


1 


- 2^r~T  +  ^LT,[ E  -  V(r)]?  -  0  (21) 

sin  ip  3$  h 


where  the  complex  number  spherical  polar  coordinates  are  written  as 

j9r 

r  -  reJ  r 
*  -  *eJ0^ 
p  »  <J>e-*9$ 


(22) 

(23) 

(24) 


where  ¥  “  complex  number  wave  function,  r  “  complex  number  radial  coordinate, 

P  *  complex  number  zenith  angle  and  £  "  complex  number  azimuthal  angle.  The  bro¬ 
ken  symmetry  of  the  coordinates  are  described  by  the  internal  phase  angles  9r  , 

9^,  and  9^,  which  are  pressure  dependent.7  The  measured  values  of  the  coordinates 
are  given  by  the  real  parts  of  the  complex  number  coordinates  given  in  equations 
(22)  through  (24)  and  are  written  as7 


r  »  r  cos  9 
m  r 

(25) 

ip  *  ip  cos  9, 
m  ip 

(26) 

<f>  ■  <J>  cos  9, 

m  4> 

(27) 

The  complex  number  potential  is  written  as 

V  =  Vej0V 

(28) 

which  for  the  Coulomb  potential  becomes 

V  »  -  Ze2/r 

(29) 

V  *  -  Ze2/r 

(30) 

9V  *  -  9r 

(31) 

If  the  potential  were  directly  measurable  the  measured  value  would  be  given  by7 

V  -  V  cos  9,,  (32) 

m  v 

2 

■  -  Ze  /r  cos  9 

r 

Note  the  effect  of  the  external  pressure  is  assumed  only  to  make  the  coordinates 
complex  numbers  and  not  to  change  the  basic  form  of  the  Coulomb  potential.  The 
complex  number  total  energy  is  written  as 
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E  -  EeJ  *  (33) 

while  its  measured  value  is  given  by 

E  -  E  cos  0_  (34) 

IQ  £ 

Both  E  and  Eg,  are  determined  in  Section  7.  The  complex  number  wave  function  is 
written  as 

?  -  4-e:J9'i'  (35) 

and  will  be  determined  in  Section  5.  The  measured  wave  function  is  given  by 

'V  -  y  cos  (36) 

m  r 

The  complex  number  SchriJdinger  equation  (21)  can  be  separated  into  three 
component  equations  by  following  the  standard  recipe  of  writing  the  wave  func¬ 
tion  as  a  product  of  three  independent  functions  as  follows12-1** 


?  -  g(?)W(J)*($) 


where 


Rej0R 


radial  wave  function 


W  -  „  zeniCh  angle  wave  function  (39) 

?  «  <J>e^9<fr  »  azimuthal  angle  wave  function  (40) 

Combining  equations  (37)  through  (40)  gives 

f  -  RW4>  (41) 

%  ■  e«  +  9w  +  6.  <42) 

Placing  equation  (37)  into  equation  (21)  yields  the  following  generalizations 
of  the  standard  azimuthal  angle  equation,  zenith  angle  equation  and  radial  equa¬ 
tion  respectively12-1** 

d2*/d*2  +  M2?  -  0  (43) 


l/sin  ?  d/d<Ksin  ijJ  dW/d<(/)  +  (§  -  M2/sin2  4>)W  -  0 
r2d2R/dr2  +  2r  dR/dr  +  (k2r2  -5)5-0 


where 


k2  -  2u/A2(E  -  V) 


M  -  MeJ“« 
s  *  3e'0S 


where  M  -  complex  magnetic  quantum  number  which  will  be  determined  in  Section  3, 
and  S  “  complex  number  constant  that  will  be  determined  in  Section  4.  The  so¬ 
lution  of  the  complex  number  Schrddinger  equation  for  a  hydrogen-like  atom  re¬ 
quires  first  the  solution  of  equations  (43)  through  (45)  and  the  determination 
of  the  six  functions  R(r)  ,  9g(r)  ,  W(i j>)  ,  dw(ip)  ,  $(<)>)  and  8$(<j>)  ,  and  second¬ 
ly  the  determination  of  the  values  of  E  and  9jr  which  give  the  complex  number 
energy  eigenvalues. 

3.  THE  AZIMUTHAL  ANGLE  EQUATION  FOR  A  HYDROGEN-LIKE  ATOM  WITH  BROKEN  INT- 


ERNAL  SYMMETRY.  This  section  determines  the  solution  of  the  complex  number  azi¬ 
muthal  equation  and  gives  the  magnitude  and  internal  phase  angle  of  the  complex 
magnetic  quantum  number  M.  The  formal  solution  of  equation  (43)  is  written  as12-1 4 

♦  «  Ae^  +  Be"11**  (49) 

It  will  now  be  shown  that  M$  must  be  a  real  number  if  $  is  to  be  symmetrical  (un¬ 
changed)  under  a  2ir  change  of  the  value  of  $m  .  In  other  words,  because  <frm  giv¬ 
en  by  equation  (27)  is  the  measured  azimuthal  angle,  the  wave  function  in  equa¬ 
tion  (49)  must  be  unchanged  under_$m  -*•  q>m  +  2ir  ,  and  this  implies  the  reality  of 

M$  .  The  reality  condition  for  M$  can  be  written  as 

M$  -  m  (50) 

0M  +  %  "  0  <51> 

where  equations  (24)  and  (47)  were  used.  In  order  to  verify  this  conclusion  the 
complex  numbers  M  and  <p  are  written  in  terms  of  their  real  and  imaginary  parts 
as  follows 

M  *  Mr  +  jMj.  *  M(cos  0M  +  j  sin  9^)  (52) 

*  3  *R  +  ^*1  3  ^(cos  9(j)  +  j  sin  (53) 

Using  equations  (52)  and  (53)  allow  MO  to  be  written  as 

M*  -  MR<t»R  -  MI(^I  +  j(MT<frR  +  MR<t>I)  (54) 

If  the  imaginary  part  of  equation  (54)  is  zero,  then 


♦i  - '  “iV",  (55> 

and  substituting  equation  (55)  into  equation  (54)  gives 

MO  -  (m£  +  M^)^^  -  M2^/!^  -  (56) 

which  shows  that  if  MO  is  real  it  is  also  linear  in  0m  the  measured  value  of 
given  by  equation  (27) . 


The  linearity  in  3^  shown  by  equation  (56)  allows  the  possibility  of  having 
the  azimuthal  wave  function  given  in  equation  (49)  unchanged  under  0m  -  Jm  +  2r, 
by  requiring 
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M  /M^  -  m  (57) 

where  m  «  standard  magnetic  quantum  number  which  is  a  positive  or  negative  inte¬ 
ger.  Combining  equations  (51),  (52)  and  (57)  gives 

M  “  m  cos  9*.  (58) 

M. 

■  m  cos  8, 

<P 

as  the  condition  for  symmetry  under  ^  -*■  +  2ir  .  The  exponent  terms  in  equa¬ 

tion  (49)  can  be  written  as 


Md>  -  M$ 


and  therefore 


cos  0,  ■  m<|>_  ■ 
<#  K 


$  *  Aeilm'<l>n  +  (60) 

The  magnitude  and  internal  phase  angle  of  the  complex  magnetic  quantum  number  M 
are_given  by  equations  (58)  and  (51)  respectively.  The  real  and  imaginary  parts 
of  M  are  given  by 


\  " 


cos  6 


Mj  ■  -  m 


sin  9 ,  cos  0 


M  »  m  cos  9  e  J  *  (63) 

The  interesting  thing  about  equation  (58)  is  that  M  is  not  an  integer  but  reduces 
to  an  integer  for  the  symmetrical  case  of  9^-0  . 

4.  THE  ZENITH  ANGLE  EQUATION  FOR  A  HYDROGEN -LIKE  ATOM  WITH  BROKEN  INTERNAL 
SYMMETRY .  This  section  solves  the  complex  number  zenith  angle  equation  (44)  and 
determines  the  complex  number  parameter  8  that  appears  in  this  equation.  This 
equation  will  be  solved  by  a  simple  generalization  of  the  standard  technique 
used  to  solve  the  corresponding  scalar  form  of  this  equation. 12-1 4  Define  the 
complex  number  £  by 


cos  ^  *  £e 


.  c  e"^9^ 


so  that7 


C  -  [cos2(if>  cos  9  )  +  sinh2(i|i  sin  9  )]1/,Z 


tan  9 


tan(ij)  cos  0  )  tanh(^  sin  9^) 


Then  equation  (44)  can  be  written  as 

d/d£[(l  -  l2)dU/d g]  +  [3  -  M2/ ( 1  -  i2)]w 
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where  M  is  given  by  equation  (63) . 


The  solution  of  equation  (69)  follows  from  the  standard  procedure  for  the 
solution  of  the  corresponding  scalar  equation  by  writing12”14 

W  -  (1  -  C2)“,/2G(C)  (70) 

where 

M’  •  |mj  cos  9^  e  M’  *  jmj  cos  8^  (70A) 

■  |mj  cos2  9^  »  -  jmj  sin  9^  cos  8^  (70B) 

which  gives12"14 

(1  -  £2)d2G/d£2  -  2(M’  +  l)5dG/d5  +  [5  -  M'  (M*  +  1)]G  -  0  (71) 

The  solution  of  equation  (71)  is  obtained  by  a  power  series  expansion  and  gives 
the  following  generalization  of  the  standard  results  for  real  numbers12-14 

ia+2/Sa  -  [(a  +  M*) (a  +  S'  +  1)  -  5]/[(o  +  l)(o  +  2)]  (72) 

where  aCT  and  aCT+2  “  coefficients  in  a  power  series  expansion  of  G  ,  and  a  m  in¬ 
teger.  Equation  (72)  shows  that  the  only  way  the  power  series  breaks  off  at 
term  is  to  have 

0  -  C(C  +  1)  (73) 

where 

C  *  M'  +  v  (74) 

where  v  *  integer.  Equations  (73)  and  (74)  are  recognized  as  complex  number 
generalizations  of  the  standard  scalar  results. 12-1 4  The  important  thing  about 
equations  (72)  through  (74)  is  that  0  ,  M*  and  C  are  complex  numbers  and  not_in- 
tegers  as  in  the  standard  case.  The  only  integer ^requirement  is  that  C  and  M' 
differ  by  an  integer  as  shown  in  equation  (74).  C  is  a  complex  number  general¬ 
ization  of  the  standard  integer  azimuthal  quantum  number  t  . 

The  solution  to  equation  (69)  can  then  be  written  using  equations  (70),  (72) 
and  (73) 

W  -  (1  -  £2)S’/2  Cq{i  +  l/2[M'(M'  +  1)  -  Z(Z  +  1)]£2  (75) 

+  1/24[M,(M'  +  1)  -  Z(Z  +  1)][(M’  +  2)(M'  +  3)  -  £(£  +  1)]£4  +  •••} 

+  (1  -  l2)*' 11  Cl{5  +  1/2[(M'  +  1)(M’  +  2)  -  Z(Z  +  l)]t3 

+  1/120[(M'  +  1)(M'  +  2)  -  C(C  +  1)][(M'  +  3)(M’  +  4)  -  C(C  +  1)]£5  +  •••} 

which  clearly  breaks  off  when  equation  (74)  is  satisfied  for  a  series  of  integers 
v»0,i,2,*«*.  The  solutions  given  in  equation  (75)  are  simple  general¬ 
izations  of  the  standard  associated  Legendre  functions,  so  that  formally 


The  value  of  M'  that  appears  in  equations  (75)  through  (80)  is  given  by  equation 
(70A).  In  this  way  the  solutions  of  the  azimuthal  equation  for  a  hydrogen-like 
atom  with  broken  internal  symmetry  are  obtained.  The  solution  given  in  equations 
(75)  through  (80)  are  given  in  Section  6  for  specific  atomic  shells. 

It  is  clear  that  the  integer  v  in  equation  (74)  must  be  given  by  v  *  t  -  |m| 
because  equation  (74)  is  also  valid  for  the  case  of  zero  internal  phase,  and 
therefore 

l  -  M*  +  l  -  | m|  (81) 

■  |m|  cos  0^  e  +  l  -  |m| 

*  l  -  |mj  sin  0^  -  j|ra|  sin  9^  cos  0^ 
where  M’  is  given  by  equation  (70A). 

The  complex  azimuthal  angular  momentum  quantum  number  Z  can  be  written  as 
Z  -  Cej0C  '  (82) 

Combining  equations  (74),  (81)  and  (82)  gives 


C  cos  0r  ”  M’  cos  9U,  +  L  -  |mj 
L  M 

C  sin  S.  -  M'  sin  9U, 

L  n 


(83) 

(84) 
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where  M'  and  0M«  are  given  by  equations  (70A)  and  (51)  respectively.  Combining 
equations  (51),  (58),  (83)  and  (84)  gives 

tan  0£  -  (M'  sin  0M,)/(M'  cos  0^,  +  t  -  jm|)  (85) 

*  -  (jmj  sin  9^  cos  0^)/(-£  -  |m|  sin2  9  ) 

0£  %  ~  |m|8  At  (85A) 

C2  -  £2[l  -  |m|/t(2  -  | m| /■£) sin2  0  ]  (86) 

■  £2  -  ( tn|  (2 1  -  |m|)sin2  0^ 

Note  that  if  9^  ■  0  then  0£  ■  0  and  C  ■  t  .  Also  if  m  ■  0  then  0£  ■  0  and  C  ■  t . 
From  equation  (86)  it  follows  that  in  general  £  <  Z  .  The  interesting  point  is 
that  equation  (86)  shows  that  £  is  not  an  integer  and  depends  on  the  values  of 
[m|  as  follows: 

v  »  0  ,  t  -  |  mj 

C  *  t  cos  9  *  I  mi  cos  8,  (87) 

9  9 

v  -  1  ,  L  -  |m|  +  1 

C2  -  (|m|  +  l)2[l  -  |m| (|m|  +  2)/(jm|  +  l)2  sin2  9  ]  (88) 

*  1  +  |m|(|mj  +  2)cos2  9^ 

*  l2  -  {lZ  ~  l)sin2  0 

9 

v  *  2  ,  t  *  |m|  +  2 

C2  »  (|m|  +  2) 2[ 1  -  | m| ( | m |  +  4)/(jm|  +  2)2  sin2  8  ]  (89) 

*  4  +  |tnj  ()m|  +  4)cos  0^ 

*  l1  -  (l2  -  4)  sin2  0 

9 

v  ■  3  ,  t  •  |m|  +  3 

C2  -  (|m|  +  3)2[l  -  |m| (|m|  +  6)/(|m|  +  3)2  sin2  9  ]  (90) 

■  9  +  |mj(!m|  +  6)cos2  9^ 

-  I2  -  (l2  -  9) sin2  9 , 

9 

5.  THE  RADIAL  EQUATION  FOR  HYDROGEN-LIKE  ATOMS  WITH  BROKEN  INTERNAL  SYM¬ 
METRIES  .  In  this  section  the  complex  number  radial  equation  (45)  is  solved  and 
the  complex  principal  quantum  number  is  introduced.  Combining  equations  (45) 


(91) 


and  (73)  gives 

r2d25/dr2  +  2rdR/dr  +  [k2?2  -  2(2  +  1)]R 


where  2  is  given  by  equations  (74)  or  (81)  and  k  is  given  by 
k2  »  2p/ft2(E  +  Ze2/?) 


(92) 


The  solution  to  equation  (91)  can  be  found  by  a  generalization  of  the  standard 


method  developed  for  the  real  number  form  of  the  radial  equation. 12-14 
of  dependent  and  independent  variables  is  made  by  writing 


25?  -  pej9P 
p£e“*/2L(p) 


Re 


j0R 


A  change 

(93) 

(94) 


where 


2  -  2 

S'  -  -  2yE/ft 


2  2j0a 


a  e 


(95) 


Substituting  equations  (93)  through  (95)  into  equation  (91)  gives 

pd2L/dp2  +  [2(2  +  1)  -  p]dL/dp  +  (n  -  2  -  1)L  =  0 

where 

J9n 


ne- 


vZe2/(aft2)  »  Z/(a  5) 
o 


ft2/  (ue2) 


(96) 


(97) 


where  n  »  complex  principal  quantum  number  and  aQ  ■  Bohr  radius.  From  equation 
(95)  it  follows  that 


a2  =*  -  2yE/ft2 


26  -  9 

a  E 


while  from  equation  (97)  it  follows  that 


n  ■  uZe2/(aft2)  *  Z/(a  a) 

o 


0 


-  9  «  -  e/2 

a  E 


(99) 

(100) 

(101) 

(102) 


Finally  from  equations  (93)  and  (97)  it  follows  that 


2Zr/(a  n) 
o 

2Zr/(a  r) 
o 

■9  +9  -  9 

r  a  r 


9  -  6  +  9_/2 

n  r  E 


(103) 

(104) 

(105) 
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Thus  0p  introduces  the  radial  coordinate  internal  phase  angle  9r  .  The  values 
of  n  and  8n  will  be  determined  later  in  this  section. 

The  solution  of  equation  (96)  can  be  obtained  by  a  power  series  whose  terms 
have  the  form  avtpv  where  av<  are  determined  by  the  following  generalization  of 
the  standard  recursion  formula12 

(n  -  £  -  1  -  v’)3v,  +  [2 (v '  +  1) (C  +  1)  +  v'(v*  +  l)]av,+1  -  0  (106) 

where  v'  «  integer.  The  condition  that  ajjp^  be  the  last  non-zero  term  is  that 
%+l  *  0  so  that  the  break  off  condition  is 

n  -  Z  +  1  +  N  (107) 

where  N  ■  integer.  Equation  (107)  is  the  complex  number  generalization  of  the 
standard  scalar  result.12  For  a  hydrogen-like  atom  with  broken  internal  symme¬ 
tries  the  principal  quantum  number  is  a  complex  number  related  to  C  by  equation 
(107).  Equations  (106)  and  (107)  show  that  a  break  off  solution  to  equation  (96) 
is  possible  for  complex  principal  and  azimuthal  quantum  numbers  provided  that 
n  -  C  -  integer.  Because  equation  (107)  is  also  valid  for  zero  internal  phase 
angles  it  follows  that 

n  «  l  +  1  +  N  (108) 

where  n  and  t  *  standard  integer  principal  and  azimuthal  quantum  numbers  respec¬ 
tively  . 

Combining  equations  (107)  and  (108)  gives 

n  -  Z  +  n  -  l  (109) 

where  C  is  given  by  equations  (74)  or  (81).  The  real  and  imaginary  parts  of 
equation  (109)  give 

n  cos  0  ■  C  cos  9.  +  n  -  t  (110) 

n  ** 

n  sin  9  *  C  sin  9r  (111) 

n  i* 

Combining  equations  (110)  and  (111)  gives 

tan  0  *  (C  sin  9r)/(C  cos  9  +  n  -  £)  (112) 

n  Cl 

n2  «  C2  +  2£(n  -  ^)cos  0^  +  (n  -  £)2  (113) 

where  0£  and  C  are  given  by  equations  (85)  and  (86)  respectively.  Alternatively , 
equation  (109)  can  be  rewritten  using  equation  (81)  with  the  result 

n  *  M'  +  n  -  |  mj  (114) 

■  I  ml  cos  9  e  J  '$  +  n  -  'ml 
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where  M'  is  given  by  equation  (70A) .  Taking  the  real  and  imaginary  parts  of 
equation  (114)  gives 


n  cos  0  *  M*  cos  0j.,  +  n  -  I  m 

n  M 


(115) 


I mj  cos  0^  +  n  -  | m| 
2 

n  -  I  ml  sin  0A 


n  sin  0  ■  M'  sin  9.., 

n  M' 


(116) 


-  m  sin  9,  cos  0 


Combining  equations  (115)  and  (116)  gives 


tan  0  *  -  ( I m I  sin  0,  cos  0,)/(n  -  I m I  sin  9.) 

n  <p  <P  <p 


(117) 


*  n^[l  -  |m|/n(2  -  |m|/n)sin^  9^] 

*  n^  -  jmj(2n  -  Imhsin^  9 


(118) 


Equations  (117)  and  (118)  give  the  internal  phase  angle  and  magnitude  respective¬ 
ly  of  the  complex  principal  quantum  number  for  a  hydrogen-like  atom  with  broken 
internal  symmetries.  The  values  of  0^  and  n  depend  on  both  n  and  |m|  .  For 
m  *  0  it  follows  that  9n  “  0  and  n  ■  n  .  For  the  symmetric  case  with  0^  *  0  it 
follows  that  0n=*  0  and  n  *  n  .  From  equation  (118)  it  follows  that  for  small  0$ 


n  h.  n[l  -  l/2|mj/n(2  -  |m|/n)sin  0  ] 


(119) 


Equation  (118)  shows  that  n  ^  n  .  Also,  from  equation  (117)  it  follows  that  for 
small  0* 


9n  ^  "  lm'  /n0(() 

while  equations  (102)  and  (105)  show  that 


(120) 


eE  ^  2 | m| /n9^ 

9p  %  9r  +  M/n9< 


(121) 

(122) 


Equation  (96)  is  a  complex  number  generalization  of  the  standard  different¬ 
ial  equation  satisfied  by  associated  Laguerre  polynomials. 12  In  fact  the  solu¬ 
tion  of  equation  (96)  is  a  complex  number  associated  Laguerre  polynomial  of  de¬ 
gree  n  -  C  -  1  »  N  and  order  2C  +  1  .  Thus  the  degree  is  a  real  number  (integer) 
but  the  order  is  a  complex  number.  The  solution  of  equation  (96)  can  be  written 
formally  as12 


.-2C+1 


(123) 


The  argument  p  is  a  complex  number  given  by  equation  (93).  The  series  solution 
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to  equation  (96)  is  obtained  from  the  recursion  relations  in  equation  (106), 
so  that 


r2£+i /-N 

Vc  <p) 


N  _  .  N(N  -  1)  -2 

-  p  +  - i - c -  p 

2(£  +  1)  2(£  +  1) [4 (£  +  1)  +  2] 


(124) 


N(N  -  1) (N  -  2)  -3  . 

— . .  "  '  ■  - -  '  -  -  . .  p  -f-  •  •  • 

2 (£  +  1)[4(£  +  1)  +  2 ] [ 6  (£  +  1)  +  6] 


_Ti  "X 

out  to  p  .  Because  the  order  of  Che  polynomial  2  £  +  1  is  a  complex  number  it 
does  not  make  sense  to  derive  the  complex  number  associated  Laguerre  polynomials 
in  terms  of  derivatives  of  the  Laguerre  polynomials.12  For  the  case  of  complex 
number  order,  equation  (96)  must  be  considered  as  the  fundamental  defining  rela¬ 
tion,  and  equation  (124)  is  the  basic  solution.  Also,  the  complex  number  asso¬ 
ciated  Laguerre  polynomials  cannot  be  derived  from  a  generating  function  as  this 
requires  the  order  to  be  an  integer.12  Finally,  the  complex  number  associated 
Laguerre  polynomials  can  always  be  written  as 


-2C+1  _ 

■mt «» 


2C+1  -j  0. 

L_  -  (p,9  ,£,9f)eJ0L 
n+£  p  J- 


(125) 


where  9^  »  internal  phase  angle  of  the  complex  associated  Laguerre  polynomial. 

The  first  few  complex  number  associated  Laguerre  polynomials  are  obtained 
from  equation  (124)  to  be: 


N-0,n-£+l,n-£+l 

1 


-2£+l 
L  _ 
2£+l 


(126) 


N  = 


1  ,  n  = 

_2£+l 
L  - 
2  £+2 


£  +  2 
«  1  -  - 


1  +  2 


2(£  +  l) 


2  (£  +  1)  -  p 

2(£  +  l) 


(127) 


2  ,  naC  +  3  ,n  =  £  +  3 


_2£+l 

L2£+3 


1  - 


2p 


2p2 


"  *T*  ■  1  ■-  - - 

2 (£  l)  2 (£  +  1)[4(£  +  1)  +  2] 


(128) 


„  2 (£  +  l)(2g  +  3)  -  2(2£  +  3)p  +  p 
2(£  +  1)(2£  +  3) 


-2 
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N*3,n“£+4,n=£  +  4 


l2E+1  .  I 

2C+4 


2 (C  +  1)  2(£  +  1)[4(C  +  1)  +  2] 


(129) 


2  (£  +  1)[4(£  +  1)  +  2][6(£  +  1)  +  6] 


4(£  +  1) (2£  +  3)(£  +  2)  -  6(2C  +  3)  (£  +  2)p  ±  6(£ 

4(2  +  1) (22  +  3) (2  +  2) 


2)p2  -  p3 


The  solution  of  the  radial  equation  for  hydrogen-like  atoms  with  broken 
internal  symmetries  is  then  obtained  from  equations  (94)  and  (123)  to  be 


5  -  Cp^  e'p/2  (p) 

n+£ 


(130) 


12  -£ 

which  is  a  complex  number  generalization  of  the  standard  result.  The  term  p 
can  be  written  explicitly  as  follows 


_£  £  in  p 

p  =  e 


(131) 


where 


In  p  =  p  +j0 

p 

Using  equation  (81)  for  £  gives 

_£  A+jB 
p  33  e 


(132) 


(133) 


where 


A  *  (£  -  I  ml  sin  Q.)Zn  p  +  I  ml  0  sin  0,  cos  8, 

9  P  9  <1 


(134) 


3  a  -  1  ml  sin  9,  cos  9  Zn  p  +  9  (Z  -  'ml  sin  0  ) 
9  9  p  <? 


Note  also  that 


(133) 


-p/2  -p/2(cos  9  +  j  sin  9  ) 

e  *  e  p  p 


(136) 


The  quantities  p  and  9p  that  appear  in  equations  (134)  through  (136)  depend  on 
n  and  9n  through  equations  (104)  and  (105),  and  in  turn  n  and  9n  depend  on  n  , 

| mi  and  8a  through  equations  (118)  and  (117)  respectively.  Combining  equations 
(125),  (130),  (133)  and  (136)  gives 


a  -  ce(A_o/2  cos  9p)e^B-^2  sin  ®o+«l>  L2£tl (p,9  ,£,e  ) 

n+£  p  £ 


(137) 
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The  interesting  thing  about  the  complex  number  broken  symmetry  radial  solution 
in  equation  (137)  is  that  it  depends  on  the  magnetic  quantum  number  |m|  an  well 
as  on  the  principal  quantum  number  n  .  In  fact  equations  (104)  and  (119)  show 
that 

p  -  2Zr/(aQn)  (138) 

^  2Zr/ (a  n)[l  +  l/2|m|/n(2  -  lra|/n)sinZ  9,] 
o  <p 

where  the  approximation  is  for  small  9^  .  Also  from  equation  (122)  9p  intro¬ 
duces  both  9r  and  9 ^  .  Therefore  for  m  ^  0  both  9r  and  9^  appear  in  the  solu¬ 
tion  of  the  complex  number  radial  equation.  As  the  simplest  case  consider 
n  -  1  ,  Z  *  0  and  m  -  0  .  Then  A  -  0  ,  B  -  0  and  n  -  1  and 


p  *  2Zr/a 

o 

(139) 

9-9 

(140) 

P  r 

a  .  c  e-ZBM° 
o 

(141) 

R  -  C  e  Zr/a°  COS  9r  cos(Zr/a  sin  9  ) 
mo  or 

(142) 

The  wave  functions  for  hydrogen-like  atoms  with  broken  internal  symmetries  are 
listed,  by  atomic  shells  in  Section  6. 

6 .  WAVE  FUNCTIONS  FOR  A  HYDROGEN  -LIKE  ATOM  WITH  BROKEN  INTERNAL  SYMMETRIES . 
This  section  consists  of  a  table  of  wave  functions  for  the  atomic  shells  of  hy¬ 
drogen-like  atoms  with  broken  internal  symmetries.  The  broken  internal  symmetry 
is  due  basically  to  the  broken  internal  symmetries  of  the  coordinates  which  are 
described  by  9r  ,  0^,  and  9^  as  discussed  in  Sections  3  through  5.  The  complex 
magnetic  quantum  number  given  by  equation  (70A)  appears  frequently  in  the  atomic 
wave  functions  and  will  be  written  as 

M'  ■  |  mj  y  M'  *  |  m |  y  (143) 

where 

y  *  cos  8,  e  *  cosZ  9  ,  —  i  cos  9,  sin  9,  (144) 

9  9  9  9 

y  «  cos  9,  (145) 

9 

The  following  is  a  table  of  complete  wave  functions  for  the  K  ,  L  >  M  and  N 
shells.  This  table  is  a  generalization  of  the  standard  scalar  results.12 
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K  shell:  n  *  1 


1*0  ,  m-0  ,  M'  -  0  ,  M'  *  0  ,  £-0  ,  C-0  ,  n 

21  +  1-1,  n  +  J-  l,p-  2Zr/a  ,  N  -  0 

o 


Is 


o 


n 


(146) 


L  shell :  n  -  2 

£-0,m-0,M'»0,Mr-0,Z-0,C«0,n«2,n-2, 

2£  +  1  -  1  >  q  +  £  *  2  ,  p  *  2Zr/(2a  )  ,  N  -  1 

o 

4'2s0  *  C2s  1/2(2  -  p)e"p/2  (147) 


•£-l,m-0,M'  =  0,M'  -  0,2-1,  C-l,n-2,n-2,- 

2£+l*3,n+£»*3,p*  2Zr/(2a  )  ,  N  *  0 

o 


COS  Ip 


(148) 


&  *  1  ,  m  -  1  ,  M'  »y,M'  *y,£“y,£«y,n*y+l  , 

n  »  2(1  -  3/4  sin2  9.)^2  ,  2£  +  1  •  2y  +  1  ,  n  +  £  *  2y  +  1  , 

<P 

p  -  2Zr/[(y  +  l)aQ]  ,  N  *  0 

-  C,  e  P^2  p^  sin^  ip  cos  (yip)  (149) 

2pl  2pl 


l  «  1  ,  m  -  -  1  ,  same  as  for  equation  (149) 


-  C.  e  P|/2  p^  siny  P  sin(yp)  (150) 

2p-l  2p-l 
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M  shell:  n  ■  3 


£-0,m-0,M’  -  0,M*  -  0,2-0,  £-0,n«3,n-3, 

22  +  1-1  ,  n  +  2  -  3  ,  p  -  2Zf/(3aQ)  ,  N  -  2 

?3s  -  C3s  e"?/2  1/6(6  -  6p  +  p2)  (151) 

o  o 


£  *  1  ,m-0,M'-0,M'-0,  2-1  ,£-1  ,  n  -  3  ,  n  -  3  , 
2£+l-3,n+£-4,p-  2Zr/(3aQ)  ,  N  -  1 


C-  e”*5^2  p/4(4  -  p)cos  ift 
3po 


(152) 


£  =  1  ,  m  -  1  ,  3*  -  y  ,  M*  -y,2-y,£»y,  n  -  y  +  2  , 

n  -  3(1  -  5/9  sin2  0.)1/2  ,  22  +  1  «  2y  +  1  ,  n  +  £  «  2y  +  2  . 

<P 

P  -  2Zr/[(y  +  2)aQ]  ,  N  -  1 

*3Pl  *  sPle_p/2  py  ^2J2"(|  + •  ifpl  sia7  * cos(y*}  (153) 


£  *  1  ,  m  -  -  1  ,  same  as  for  equation  (153) 

f 3p  -  C3p  e_P^2  py  sin^  *  sin(y*)  ( 154) 

£-2  ,  m-0  ,  M'  -  0  ,  M'  -  0  ,  2-2  ,  £-2  ,  n  -  3  ,  n  -  3  , 

22+l-5,n+2-5,5-  2Zr/(3a  )  ,  N  -  0 

o 

?3d  “  C3d  e”^2  p2(3  cos2  4>  -  1  ) 

o  o 


(155) 


£  -  2  ,  m  -  1  ,  M'  -  y  ,  M'  -  y  ,  C  -  y  +  1  ,  C  -  2(1  -  3/4  sin2  8J1/2  , 

n  -  y  +  2  ,  n  -  3(1  -  5/9  sin4  e^r'4  ,  2C  +  1  -  2y  +  3  ,  n  +  C  -  2y  +  3  , 

P  »  2Zr/[ (y  +  2)aQ]  ,  N  -  0 

f.,  ■  C- ,  e  ^2  sin^  tJJ  cos  IjJ  cos(y$)  (156) 

3dl  3dl 

£*2,m*-l,  same  as  for  equation  (156) 

*  C_ ,  e  P/^2  p^+*  siny  cos  5  sin(y<)>)  (157) 


t  *  2  ,  m  -  2  ,  M'  ■  2y  ,  M'  »  2y  ,  £  ■  2y  ,  C  ■  2y  ,  n  ■  2y  +  1, 

n  -  3(1  -  8/9  sin2  9J1/2-  ,  ll  +  1  -  4y  +  1  ,  n  +  l  -  4y  +  1  , 

<p 

p  »  2Zr/[(2y  +  l)aQ]  ,  N  -  0 


*  c  e-P^2  p2^  Sin2^  jl  cos(2y<)>) 

3d2  3d2 

(158) 

£  ■  2  ,  m  *  -  2  ,  same  as  for  equation  (158) 

'?3>  *  C_,  e-^2  p2^  sin2^  J  sin(2y4>) 

(159) 

-2  -2 
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£  «  1  ,  m-O.M’-O.M'  -  0  ,  £  -  1  ,  £  «  1  ,  n  »  4  ,  n  -  4  , 

2£+l-3,n+£»5,p-  2Zr/(4a  )  ,  N  -  2 

o 

?4  -  C4  e~p/2  p/20(20  -  lOp  +  p 2 ) cos  if  (161) 

Po  po 


£  «  1  ,  ra  *  1  ,  M,*y,M,"y,£*y,£*y,n*y  +  3, 
n  -  4(1  -  7/16  sin2  9  ) 1 ^2  ,  2£  +  1  -  2y  +  1  ,  n  +  £  -  2y  +  3  , 
p  ■  2Zr/[(y  +  3)aQ]  ,  N  -  2 


-  -p/2  _y 

C.  e  p 


±  1)(Z 


2(y  +  1) (2y  +  3) 


£  ■  1  ,  m  *  -  1  ,  same  as  for  equation  (162) 


s-p/2  -y  [2(y  +  1)  (2y  4-3)  -  2(2y  +  3)p  +  p 
P  2(y  +  l)(2y  +  3) 


siny  if  sin(y<j>)  (163) 


£  -  2  ,  m  »  0  ,  M'  -0  .  M’-O  ,  £-2  ,  £-2  ,  n  -  4  ,  n  -  4  , 

2£+l-5,n+£-6,p-  2Zr/(4a  )  ,  N  -  1 

o 

?4d  *  C4d  e'p/2  p2  1/6(6  -  p)  (3  cos2  if  -  1)  (164) 

o  o 


£  *  2  ,  m  *  1  ,  M'  -  y  ,  M1  -  y  ,  E  -  y  +  1  ,  £*2(1  -  3/4  sin2  9^)  ^2  , 
n-y+3  ,  n  -  4(1  -  7/16  sin2  0  ) 1/2  ,  ll  +  1  -  2y  +  3  ,  n  +  £  -  2y  +  4  , 
p  -  2Zr/[(y  +  3)aQ]  ,  N  -  1 

?4d  ■  C4d  e”^2  py+1  ^2^t  2^2~  siny  if  cos  if  cos(y$)  (165) 
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t  »  2  , 


m 


-  1  ,  same  as  for  equation  (165) 


4d 


-1 


'4d 


rp/2  -y+1  [2,(f  +  2)  T-pI  s 


-1 


2(y  +  2) 


sin7  4*  cos  <|i  sin(y9) 


(166) 


£■2  ,  m  *  2  ,  M'*2y  ,  M'  ■  2y  ,  Z  m  2y  ,  £  ■  2y  »  n  *  2y  +  2  , 

n  *  4(1  -  3/4  sin2  0.)^2  ,  2£  +  1  *  4y  +  1  »  n  +  £  *  4y  +  2  , 

9 

p  -  2Zr/[ (2y  +  2)aQ]  ,  N  -  1 


!2(2v  +  1)  -  p] 
2(2y  +  1) 


2y  — 

sin  7  ip  cos(2y9) 


(167) 


£  ■  2  ,  m  ■  -  2  ,  same  as  for  equation  (167) 

?4d  “  54d  e~?/2  ?2y  i2{2W-l  1)^  Sl“2y  *  (168) 


£*3,m*0,M'  *0,M'  ■0,£“3,£“3>n*4,nj,,4, 

2£+l-7,n+E«7  ,  p  -  2Zr/(4a  )  ,  N  -  0 

o 

?.  ,  ■  C,  c  e  P^2  p8(5  cos2  if  -  3  cos  if)  (169) 

4f  4f 

o  o 


L  *  3  ,  m  ■  1  ,  M’“y  ,  M'  ■  y  ,  E  »  y  +  2  ,  £  *  3(1  -  5/9  sin2  9  )^2 

2  1/2  _ 
n  ■  y  +  3  ,  n  ■  4(1  -  7/16  sin  9.)  ,  2£  +  l»2y  +  5,n  +  C=«2y  +  5, 

9 

p  -  2Zr/[ (y  +  3)aQ]  »  N  *  0 

“  g  e-P^2  p^+2  sin^  if  [(27  +■  3)cos2  if  -  l]cos(y9)  (170) 

l  1 


619 


t  ■  3  ,  m  ■  -  1  ,  same  as  for  equation  (170) 

-  C.f  e-*^2  ?^+2  sin^  ijJ  [(2y  +  3)cos2  ij;  -  l]sin(y<$>)  (171) 

-1  -1 


l  -  3  ,  m  -  2  ,  M1  «2y  ,  M'  -  2y  ,  £  -  2y  +  1  ,  C  -  3(1  -  8/9  sin2  0  ) 1/2  , 
n  -  2y  +  2  ,  n  -  4(1  -  3/4  sin2  Q^)1^  ,  2£  +  1  *  4y  +  3  ,  n  +  £  «  4y  +  3  , 
p  -  2Zr/[(2y  +  2)aQ]  ,  N  -  0 

■  C,f  e~*^2  p2^+1  sin2^  if  cos  if  cos(2y$)  (172) 

4£2  2 


£  ■  3  ,  m  ■  -  2  ,  same  as  for  equation  (172) 

-  C,f  e-*^2  p2^+1  sin2^  4»  cos  <F  sin(2y$)  (173) 

-2  -2 


£  -  3  ,  m  -  3  ,  M'  «  3y  ,  M'  -  3y  ,  £  «  3y  ,  C  *  3y  ,  n  *  3y  +  1  , 

n  -  4(1  -  15/16  sin2  9J1/2  ,  2£  +  1  -  6y  +  1  ,  rj  +  l  -  6y  +  1  , 

<P 

p  -  2Zr/[(3y  +  l)aQ]  ,  N  -  0 

,  ■  C,f  e  *^2  sin2^  ij;  cos(3y$)  (174) 

£3  3 


£  *  3  ,  m  *  -  3  ,  same  as  for  equation  (174) 

?,  ,  »  C.p  e”*^2  p2^  sin2^  iji  sin(3y$)  (175) 

4f-3  4f-3 


The  reader  should  be  aware  that  the  value  of  p  that  appears  in  the  above  table 
depends  on  n  and  is  thus  dependent  on  n  and  |mi  . 
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.  ENERGY  LEVELS  AND  RADII  OF  HYDROGEN-LIKE  ATOMS  WITH  BROKEN  INTERNAL 


SYMMETRY.  This  section  calculates  the  measurable  values  of  the  energy  levels 
and  atomic  radii  of  hydrogen-like  atoms  with  broken  internal  symmetries.  The 
complex  number  energy  levels  are  obtained  from  equations  (95)  and  (97)  to  be 


E  -  E  e^71  -  -  *2/(2u)a2  -  -  yZ2e4/ (2ft2??2) 

n  n 


(176) 


where  n  is  given  by  equations  (109)  or  (114).  From  equation  (176)  it  follows 
that 


2  4  2  2 

E  -  -  nZ  e  /(2*n  ) 

n 

■v  -  yZ2e4/(2ft2n2)[l  +  |mj/n(2  -  |m|/n)sin2  9  ] 


(177) 


9  -  -  20 

En  n 


where  n  and  0^  are  given  by  equations  (118)  and  (117)  respectively, 
sured  energy  levels  are  given  by7 


(178) 


The  mea- 


E  «  E  cos  9 
nm  n  En 


-  uZ2e4/(2ft2n2)cos(28  ) 


From  equation  (116)  it  follows  Chat 


(179) 


sin  9  -  -  I  ml /n  cos  0,  sin  0 

n  9  < 


(180) 


cos  (29  )  *  I  -  2  sin  9 

n  n 


■  1  -  2(|m|/n)“  cos2  0  sin2  9 

9  <i 

'V  L  -  2(|m|/n)2  cos2  9  sin2  9 


Combining  equations  (177),  (179)  and  (181)  gives 


(131) 


E  *  -  uZ2e4/(2«2n2)  (1  +  F  sin2  0J 
nm  9 


(182) 


where 


F  ■  |mj/n[2  -  jm|/n(l  +  2  cos  9,)] 


(183) 


v  | mj /n(2  -  3 | m | /n) 


The  values  of  F  are  given  approximately  for  small  0 ^  as  follows 
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(184) 

(185) 

(186) 

Equations  (177),  (179)  and  (182)  reduce  to  the  standard  Bohr  result  in  equation 
(10)  for  the  case  9<j  ■  0  . 

For  the  K  and  L  shells  |m|/n  <  1/2  so  that  F  0  .  For  the  M  and  P  shells, 
etc,  it  is  possible  to  have  the  situation  |mj/n  *  2/3  and  F  *  0  .  For  the  N  ,  0  , 
P  ,  •••  shells  it  is  possible  to  have  jm|/n  >  2/3  and  F  ■  <  0  .  In  any  case  it 
is  clear  from  equations  (182)  and  (183)  that  the  measured  energy  eigenvalues  of 
hydrogen-like  atoms  with  broken  internal  symmetry  should  depend  on  |m(  as  well 
as  on  the  principal  quantum  number  n  .  In  addition,  the  asymmetry  factor  in 
equation  (182)  is  F  sin^  9^  where  9$  depends  on  the  pressure  acting  on  the  sys¬ 
tem  of  atoms.  Thus  the  broken  symmetry  of  the  azimuthal  angle  destroys  the  de¬ 
generacy  associated  with  the  energy  eigenvalues  given  by  equation  (10).  The  en¬ 
ergy  eigenvalues  do  not  depend  explicitly  on  L  and  therefore  some  remaining  de¬ 
generacy  still  exists. 

A  transition  from  a  state  n'  ,  m'  to  the  state  n  ,  m  is  according  to  equa¬ 
tion  (182)  associated  with  the  energy  difference 

AE  -  uZ2e4/(2fi2)[l/n2  -  1/n'2  +  (F/n2  -  F'/n'2)sin2  0.]  (187) 

Hm  <p 

where 

F/n2  *  |m|/n3[2  -  |m|/n(l  +  2  cos2  0  )]  (188) 

F'/n'2  *  |m'|/n,3[2  -  |m'|/n'(l  +  2  cos2  9^)]  (189) 

F/n2  -  F'/n'2  -  2(|m|/n3  -  |m’|/n'3)  -(|m|2/n4  -  |m'|2/n,4)(l  +  2  cos2  9  )  (190) 

'v  2(|m|/n3  -  |m'|/n'3)  -  3(|m|2/n4  -  |ra'l2/n'4) 

It  is  the  internal  phase  angle  9$  of  the  azimuthal  angle  that  introduces  the 

magnetic  quantum  number  into  the  energy  eigenvalues  given  in  equation  (177)  and 
in  the  formula  for  the  transition  energy  given  in  equation  (187). 

The  pressure  variation  of  the  energy  eigenvalues  will  now  be  calculated. 
From  equation  (179)  it  follows  that 

5E  /3P  -  uZ2e4/(ft2n2)[cos(29  )l/n3n/3P  +  sin(29  )30  /3P]  (191) 

-im  n  n  n 

The  derivatives  on  the  right  hand  side  of  equation  (191)  are  easily  evaluated. 
The  value  of  3n/3P  is  obtained  from  equation  (118)  to  be 


F  0  *\  m  ■  0  or  |m|/n  *  2/3 
F  >  0  >  | mj /n  <  2/3 

F  <  0  J  |m|/n  >  2/3 
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l/n3n/3P  -  -  < . 30  /3P 


(192) 


where 


jmj/n(2  -  |mj/n)sin  0^  cos  0^ 
1  -  jmj/n(2  -  [mj/n)sin2  0 


(193) 


'v  |m|/n(2  -  |m|/n)sin  0  cos  0 


The  value  of  30^/SP  can  be  obtained  from  equation  (117)  with  the  result 

30  /3P  -  -  k_36./3P  (194) 

n  ^9 

where 


n|m|cos(29^)  +  |mj2  sin2  8^ 
n2  -  |m|(2n  -  I  ml) sin2  0, 


(195) 


^  I  ml /n 


Placing  equations  (192)  and  (194)  into  equation  (191)  gives 

3E  /3P  «  -  uZ2e4/(fi2n2)[<.  cos (20  )  +  k.  sin(29  )]30  /3P  (196) 

nm  l  n  z  n  9 

where  from  equations  (115)  and  (116) 

cos(29  )  ^  1  -  2(|m|/n)2  cos2  0,  sin2  9,  (197) 

n  9  ? 

^  1 

* 

sin(20  )  ■  -  2|ml/n2(n  -  Imlsin2  9,)sin  9,  cos  0,  (198) 

n  9  9 

■v  -  2|m|/n  sin  9,  cos  0, 

9  9 

9  "v  -  |ml  in  0A  (199) 

n  9 

for  small  9^  .  Therefore  combining  equations  (193)  and  (195)  through  (198)  gives 

3E  / 3P  •v-uZ2e4/(A2n2)lm|/n(2  -  3|m|/n)sin  9,  cos  0A  3073P  (200) 

nrn  9  <p 


The  rate  of  change  of  the  energy  eigenstates  with  pressure  can  be  positive  or 
negative  according  to  the  value  of  jm!/n  .  In  particular  if  39^/9P  >  0  equa¬ 
tion  (200)  gives 


*33 


3E  /3P  <  0  'i  I  mi  /n  <  2/3 
nm 


3E  /3P  >  0  J  2/3  <  m  /n  <  1 
nm 


(201) 


(202) 


For  |mj/n  <  2/3  an  increase  in  pressure  will  lead  to  a  greater  binding  energy 
per  electron  while  for  |mj/n  >  2/3  an  increase  in  pressure  will  produce  less 
binding  energy  for  the  valence  electron.  These  conclusions  also  follow  direct¬ 
ly  from  equations  (182)  and  (183). 


Finally,  the  simplest  generalization  of  equation  (9)  for  the  radii  of  the 
Bohr  orbits  is  given  by 


-2,2,,  ^  2. 

r  -  n  *  /  (viZe  ) 


(203) 


and  therefore  combining  equations  (97),  (118)  and  (203)  gives 

r^  ■  n2ft2/(pZe2)  *  n2ft2/(pZe2)[i  -  jmj/n(2  -  |m)/n)sin2  9^]  (204) 


29  'V'  -  2  ml  /n9 

n  1  1  <1 


(205) 


where  9n  is  given  by  equations  (117)  and  (120).  The  radii  of  the  Bohr  orbits 
of  an  atom  with  broken  internal  symmetry  are  pressure  dependent  through  9(j>(P) 
The  measured  Bohr  radii  are  given  by 


r  »  r  cos  9 
nm  n  rn 

^  n2/i2/(MZe2)  (1  -  G  sin2  9.) 


(206) 


where 


G  3  |m|/n[2  +  jm|/n  cos(29^)] 


(207) 


|  m(  /n (2  +  j  m|  /n) 


where  G  ^  0  .  From  equations  (206)  and  (207)  it  follows  that  the  measured  Bohr 
radii  are  pressure  dependent  and 

3r  /3P  n,  -  2n2f«“/ (yZe2)  I  ml  /n(2  +  |mj/n)sin  9.  cos  9  39  /3P  (208) 

nm  ’  11  <p  ip  9 

Therefore  if  39^/3P  >  0  it  follows  that  3rnm/3P  <  0  .  The  analysis  leading  to 
equations  (203)  through  (208)  is  simplistic  because  higher  order  terms  associ¬ 
ated  with  the  azimuthal  angular  momentum  C  need  to  be  inserted  into  equation 
(203).* 2  The  results  for  the  energy  levels  and  atomic  radii  may  possibly  be 
tested  using  circular  atoms  for  which  ml  *  t  *  n  -  l  .1S 
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8.  CONCLUSION.  The  pressure  dependence  of  the  energy  levels  and  radii  of 
hydrogen-like  atoms  can  be  determined  by  taking  into  account  the  broken  symme¬ 
tries  of  the  coordinates  of  the  electron  and  the  nucleus.  The  broken  symmetries 
of  the  azimuthal  and  zenith  angles  are  due  essentially  to  a  broken  symmetry  pres¬ 
sure  field.  But  the  vacuum  state  also  exhibits  this  broken  symmetry.16  The 
broken  internal  symmetries  of  the  coordinates  requires  the  magnetic,  azimuthal 
and  principal  quantum  numbers  to  be  complex  numbers  which  are  associated  with 
internal  phase  angles.  The  internal  phase  angles  of  the  three  quantum  numbers 
are  expected  to  be  pressure  dependent.  Schrddinger ' s  equation  for  a  hydrogen¬ 
like  atom  can  be  solved  with  complex  quantum  numbers,  and  break  off  solutions 
to  the  azimuthal  angle,  zenith  angle  and  radial  equations  can  be  obtained. 

The  broken  symmetry  of  the  azimuthal  angle  may  be  associated  with  a  vector 
boson  which  can  be  called  the  "rauthon1*.  The  muthon  may  have  important  physical 
effects  in  systems  having  large  scale  broken  azimuthal  symmetry  such  as  perhaps 
in  the  layered  copper  oxide  structures  of  high  temperature  superconductors  where 
it  may  serve  as  the  intermediary  particle  for  the  formation  of  Cooper  pairs  of 
electrons  or  holes.  Alternatively,  the  muthon  could  also  play  a  role  as  the 
component  of  dark  matter  in  galaxies  where  large  scale  broken  azimuthal  symmetry 
may  occur  due  to  the  broken  internal  symmetry  of  the  gravitational  field. 
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NEWTONIAN  GRAVITY  IN  MATTER 
WITH  BROKEN  INTERNAL  SYMMETRY 


Richard  A.  Weiss 

U.  S.  Army  Engineer  Waterways  Experiment  Station 
Vicksburg,  Mississippi  39180 


ABSTRACT.  The  pressure  field  in  matter  is  associated  with  a  broken  int¬ 
ernal  symmetry  which  manisfests  itself  through  the  broken  internal  symmetry  of 
space  and  time  coordinates.  This  introduces  an  apparent  non-Newtonian  behav¬ 
iour  of  gravity  in  matter.  The  effective  Newtonian  gravitational  constant  for 
a  spherical  body  composed  of  matter  with  broken  internal  symmetry  is  calculat¬ 
ed  and  determined  to  be  a  function  of  radial  distance  from  the  center  of  a 
planet  or  star.  The  gravity  field  of  a  rotating  geometrically  asymmetric  plan¬ 
et  composed  of  matter  with  broken  internal  symmetries  is  investigated.  A  theo¬ 
retical  analysis  of  Ebtvds,  mine  shaft,  borehole  and  tower  gravity  variation 
experiments  is  presented  in  terms  of  Newtonian  gravity  in  matter  with  broken 
internal  symmetry.  It  is  found  that  the  discrepancies  from  Newtonian  gravity 
can  be  described  by  ordinary  Newtonian  gravity  in  matter  with  broken  internal 
symmetry  combined  with  the  variation  of  atmospheric  pressure  down  a  mine  shaft 
or  borehole  and  up  a  tower.  This  research  will  affect  the  calculation  of  tra¬ 
jectories  of  missiles  and  projectiles  in  the  earth's  atmosphere,  and  will  have 
applications  to  geophysics  and  astrophysics. 

1 .  INTRODUCTION .  Discrepancies  between  Newton's  law  of  gravitation  and 
the  measured  variation  of  gravity  with  distance  and  composition  of  tl  c  attract¬ 
ing  bodies  have  been  observed.  These  discrepancies  appeared  first  in  the  mea¬ 
surements  of  the  variation  of  the  gravity  force  with  depth  in  mine  shafts.1-4 
These  measurements  indicate  a  larger  value  of  the  gravitational  constant  than 
is  found  from  laboratory  EdtvSs  experiments.1-4  On  the  other  hand,  recent  ex¬ 
periments  on  the  variation  of  gravity  up  the  length  of  a  tower  suggest  a  value 
of  the  gravitational  constant  which  is  less  than  that  measured  in  the  laborato¬ 
ry  by  EStvds  experiments. 5  Differences  in  behaviour  from  Newtonian  gravity 
have  also  been  reported  for  the  Ebtvos  type  of  experiments  and  with  beam  bal¬ 
ance  experiments.  -1S  Other  evidence  for  non-Newtonian  behaviour  has  been  pre¬ 
sented  from  solar  system  and  stellar  system  measurements.16-18 

Attempts  to  explain  these  measured  results  by  the  introduction  of  new 
types  of  gravitational  forces  (the  "fifth"  and  "sixth"  forces)  that  have  finite 
ranges  of  the  order  of  hundreds  or  thousands  of  meters  have  been  suggested.9-2 3 
These  new  forces  would  represent  the  effects  of  massive  spin  0  and  spin  1  su¬ 
persymmetric  partners  to  the  ordinary  massless  spin  2  graviton  that  mediates 
Newtonian  gravitation  with  its  infinite  range.19-23  Much  criticism  of  the  re¬ 
ality  of  these  finite  range  forces  has  been  presented.24'25  This  is  due  in 
part  to  the  difficulties  of  separating  extraneous  effects  due  to  geological 
structure  from  the  possible  intrinsic  non-Newtonian  behaviour  of  gravity.  In 
fact  recent  data  from  a  borehole  in  the  ice  of  a  glacier  in  Greenland  suggests 
that  the  gravitation  constant  is  less  than  that  measured  by  laboratory  EdtvCs 
experiments,  and  this  disagrees  with  the  results  given  in  Reference  1-4  but 


637 


agrees  with  the  observations  in  Reference  5.  The  state  of  both  the  experimen¬ 
tal  and  theoretical  situation  is  therefore  uncertain. 

This  paper  suggests  an  alternative  explanation  for  the  apparent  non-New¬ 
tonian  behaviour  of  gravity  in  the  earth  which  is  based  on  ordinary  Newtonian 
gravitation  and  the  broken  symmetry  of  the  thermodynamic  and  mechanical  para¬ 
meters  of  bulk  matter  such  as  pressure  and  internal  energy.26*27  Some  results 
have  already  been  obtained  toward  describing  the  apparent  non-Newtonian  behav¬ 
iour  of  gravity  in  terms  of  the  ordinary  Newtonian  gravity  field  in  matter  with 
broken  internal  symmetries.28  This  was  done  by  showing  that  the  space  and  time 
coordinates  exhibit  broken  symmetries  in  matter  where  the  pressure  has  a  broken 
internal  symmetry.28  Section  2  introduces  the  relationship  between  Newtonian 
gravity  and  the  broken  internal  symmetries  of  space  and  time.  Section  3  deals 
with  complex  number  coordinates  and  the  measurement  of  space  and  time,  Section 

4  considers  Newtonian  gravity  for  rotating  non-spherical  masses  composed  of 
matter  that  induces  broken  symmetries  in  the  pressure  and  coordinates,  Section 

5  presents  a  theory  for  the  description  of  the  Eotvos,  mine  shaft,  borehole 
and  tower  experiments,  and  finally  Section  6  gives  a  numerical  calculation  of 
the  expected  values  of  the  internal  phase  angles  of  the  radial  and.  angular  co¬ 
ordinates  due  to  the  earth's  gravity  field. 

2.  NEWTONIAN  GRAVITY  AND  BROKEN  INTERNAL  SYMMETRIES.  A  gauge  theory  of 
relativistic  thermodynamics  has  been  developed  which  is  based  on  a  trace  equa¬ 
tion  which  for  completely  symmetrical  matter  or  radiation  is  given  by29 


0  +  T 


3V  ~(PV) 
dV  u 


(1) 


where  U  *  relativistic  internal  energy,  P  =  relativistic  pressure,  T  =  abso¬ 
lute  temperature,  V  =  volume  of  substance,  and  Ua  and  Pa  =  corresponding  non- 
relativistic  internal  energy  and  pressure.  Throughout  this  paper  the  index 
"a"  will  refer  to  nonrelativistic  calculations.  The  temperature  and  volume 
are  parameters  for  both  the  renormalized  and  unrenormalized  systems.  The  trace 
equation  for  matter  whose  thermodynamic  functions  have  broken  internal  symme¬ 
tries  is  given  by27 


3V  -4t(PV)  _ 
dV  u 


U  +  T 


m 


PaV 


(2) 


where  U  and  P  are  complex  number  representations  of  the  renormalized  internal 
energy  and  pressure  respectively,  and  where  T  and  V  are  the  magnitudes  of  the 
complex  number  temperature  and  volume  respectively.  Equation  (2)  can  be  fur¬ 
ther  simplified  by  using  the  following  complex  number  form  of  the  Gibbs-Helm- 
holtz-Maxvell  equation2 


3 U  „  3P 
3V  3T 


P 


(3) 


The  complex  numbers  U  and  P  that  appear  in  equation  (2)  are  written  as"' 
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(4) 


U  -  Ue- 


P  -  PeJ 


where  U,  P,  0y  and  0p  can  be  obtained  from  a  solution  of  equations  (2)  and  (3) 
The  temperature  and  volume  parameters  that  appear  in  equation  (2)  are  real  num¬ 
bers.  However  the  temperature  and  volume  themselves  are  complex  numbers  that 
are  written  as 


T  »  TeJ 


V  -  VeJ  v  (7) 

where  T  and  V  are  the  magnitudes  of  the  temperature  and  volume  and  it  is  these 
quantities  that  appear  in  the  trace  equation  (2) .  The  measured  thermodynamic 
quantities  are  given  by28 

Um  3  U  COS  9U  (8) 

Pm  =■  P  cos  0p  (9) 

Vm  "  V  cos  0v  ao) 

Tm  *  T  cos  eT  (U) 

The  phase  angles  0W  and  9p  are  obtained  from  equations  (2)  and  (3) ,  while  9V 
and  9-j.  are  related  to  coordinate  and  velocity  internal  phase  angles  as  will  be 
shown  later. 

The  determination  of  the  space  and  time  coordinate  internal  phase  angles 
follows  from  the  complex  number  Euler  equations'-3 

adv/dt  =  -  cos  8r  f  3P/3r  +  pF^  (12) 

where  the  complex  number  external  force  (such  as  gravity)  is  written  as 
j  9- 

Fr  =  Fre  r  =  "  3W/3f  (13) 

and  where  v  =  complex  number  velocity,  t  =  complex  number  time  and  r  =  complex 
number  radial  coordinate.  The  complex  number  velocity  and  space  and  time  co¬ 
ordinates  are  written  as23 


t  -  teJ°t 
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The  measured  values  of  the  space  and  time  coordinates  and  the  particle  velocity 
are  given  by28 

r  -  r  cos  9  (17) 

m  r 

t  *  t  cos  9  (18) 

m  t 

v  »  v  cos  9  (19) 

m  v  v  ' 

For  matter  in  equilibrium  equation  (12)  becomes 

pdv/dt  *  -  cos  8  3P/3r  +  pF  ■  0  (20) 

r,r  r 


where 4 


j  5  1/2  .A 

dv/dt  »  [ (dv/dt)  +  (vdB^/dt)  ]  cos  8^  eJ  v 

cos  0rjr.3P/3r  -  [(3P/3r)2  +  (P30p/3r)2] 1/2  cos  Br  r  ej 


4>  =  9  +  8  -9-0 

V  V  V,t  £ 


9+0  +8  -2(9+0  ) 

r  pr,r  v,t  v  t  Pt,t' 


V*  9P+  SP,r 


tan  0 


vd9  /dt 
v 

v,t  dv/dt 


tan  3  ■  t30  /3t 

L  i  L  L 


tan  8. 


P30p/3r 
P,r  *  3P/3r 


tan  8  ■  r30  /3r  (28) 

r ,  r  r 

To  obtain  the  second  relation  in  equation  (23)  the  following  relationship  is 


0-0+0  -  0  -  e 

v  r  r,r  t  t,t 


Combining  equations  (21)  through  (29)  shows  that  the  equilibrium  condi¬ 
tion  given  by  equation  (20)  is  equivalent  to 
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(30) 


dv/dt  -  0 


d6  /dt  -  0 
v 


P  Fr 


(33A) 


2  2  1/2 
[(3P/3rr  +  (P30o/3r)Z]  cos  8,.  „ 

r  r  i  r 


(33B) 


From  equations  (23)  and  (30)  it  follows  that  for  equilibrium 


0-0  +  0  -9-S 

v  v,t  t  t,t 


9+8  +8  -  2(0  +  0  ) 

r  r,r  vft  t  t,t 


Neglecting  the  3's  in  equation  (34)  gives  the  following  approximation 

0  ^  0  ^0/2  (35) 

v  t  r 

The  relationship  between  0r  and  9p  is  obtained  from  equations  (33A)  and  (33B)  . 
For  gravity,  equations  (33A)  and  <"33B)  yield  a  set  of  coupled  differential  equa¬ 
tions  for  P,  0p  and  0r  . 28  An  approximate  solution  of  equation  (33A)  gives  the 
following  equation  for  matter  in  a  gravity  field28 

9  v  -  0  (36) 

r  P 

Then  equations  (35)  and  (36)  give  for  a  gravitating  system 

0  v  0  v  -  0/2  ’  (37) 

v  t  P 

For  a  general  system  one  has 

Q  %  -  o0_  (38) 

r  P 

where  a  -  index  that  describes  the  state  equation  for  matter.  Equations  (30) 
through  (33)  give  the  general  conditions  of  equilibrium.  For  photons  0C  -  9r  , 
and  the  light  speed  has  a  zero  internal  phase  angle. 

3 

From  equation  (36)  and  the  relation  V  v  4/3irr  it  follows  that  the  phase 
angle  for  the  volume  is  given  by 


9y  -v  39r  *  -  30p 


641 


for  a  gravitating  system.  For  a  uniform  system  the  volume  is  given  by 


V  -  4/3irr^  ■  e^0v  4/3irr^  (41) 

so  that 

V  -  Va  (42) 

The  renormalized  and  unrenormalized  scalar  coordinates  are  parameters  related 


r  -  ra 

(43) 

*  "  *a 

(44) 

.  ,  a 

♦  ■  <P 

(45) 

where  r  *  magnitude  of  radial  coordinate,  «  magnitude  of  the 
zenith  angle  and  $  ■  magnitude  of  the  complex  number  azimuthal 
and  Va  are  simply  equivalent  parameters  in  the  trace  equations 

complex  number 
angle.  Thus  V 
(1)  and  (2). 

The  determination  of  the  broken  symmetry  phase  angle  of  the  temperature  9-p 
is  determined  from  the  energy  equipartitioh  theorem  which  can  be  written  for  a 
complex  number  particle  velocity  as 

e  -  <l/2mv2>  -  kT 

(46) 

where  e  -  complex  number  average  kinetic  energy  per  particle  and  where  m  -  par¬ 
ticle  mass  and  k  -  Boltzmann  constant.  The  real  and  imaginary  parts  of  equation 
(46)  can  be  written  as 

CO 

e  *  m/2  /  v^cos(29  )g(v)dv  *  kT  cos  9 

K  o  v  1 

CO 

(47) 

e  *  m/2  f  v^sin(29  )g(v)dv  ■  kT  sin  8_ 

1  o  v 

(48) 

where  g(v)  *  renormalized  molecular  velocity  distribution  function.  Then 

can  9t  *  ex/,eR  (49) 

kT  -  (e£  +  e21)UZ  (50) 

which  are  the  equations  of  S-p  and  T  .  For  a  gravitating  system  8V  is  given  by 
equation  (37),  and  for  this  case.  9y  is  independent  of  the  molecular  speeds. 
Therefore  for  this  case  it  follows  from  equation  (37)  and  equations  (46)  through 
(48)  that 

9  «  29  v  -  fl  (51) 

Tv  P 
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Because  the  magnitude  of  the  temperature  T  must  appear  on  both  sides  of  the  ba¬ 
sic  trace  equation  (1)  and  (2)  it  follows  that 


T  -  Ta 


(52) 


where 


kT 


m/2  J  v^ga(va)dv 


(53) 


In  fact  equation  (52)  implies  the  validity  of  equation  (51)  and  the  relation 
g  *  ga  .  Therefore  T  and  T*  simply  play  the  roles  of  equivalent  parameters  in 
the  trace  equations  (1)  and  (2) .  The  measured  temperature  is  obtained  from  equa¬ 
tions  (11),  (51)  and  (52)  to  be 


T  ■  T  cos  0_ 
m  T 


T  cos  8, 


Ta  cos  0. 


(54) 


In  view  of  the  complex  number  values  of  the  volume  and  temperature  it  might 
be  thought  that"  the  trace  equation  (2)  should  be  written  in  the  following  com¬ 
pletely  asymmetric  form 


U  +  T 


($)_  - 3* 
PV 


dV 


Ua  +  Ta 


(5). 


(55) 


pa\ra 


but  this  is  not  correct  as  can  be  seen  by  applying  equation  (55)  to  the  real 
classical  gases.  The  experimental  fact  that  the  first  term  of  the  virial  expan¬ 
sion  (the  ideal  gas)  and  the  second  virial  coefficient  must  be  unaffected  by 
equation  (55)  requires  that  the  temperature  and  volume  terms  that  appear  as  com¬ 
plex  numbers  in  the  left  hand  side  of  equation  (55)  must  in  fact  actually  appear 
as  real  numbers  equal  to  the  magnitudes  of  the  complex  number  volume  and  temper¬ 
ature  as  shown  correctly  in  equations  (2),  (42)  and  (52). 2 7  The  trace  equation 
corresponds  to  a  uniform  pressure  and  energy  density  system  so  that  equations 
(42)  and  (52)  are  implicitly  assumed  in  equations  (1)  and  (2). 

For  systems  with  nonuniform  pressure  fields,  the  determination  of  the  in¬ 
ternal  phase  angles  of  the  coordinates  generally  involves  the  solution  of  cou¬ 
pled  differential  equations.23  Thus  for  gravitating  stars  or  planets'  the  deter¬ 
mination  of  0,  - 1 -u ~  — —  ~c  c-11 ------  - —  - - - 23" 


,r  involves  the  solution  of  the  following  two  equations" 

2 -.1/2, 


cos  8 


r,r 


8p  +  tan 


A.  cr  i  j.  n 

~ T  T~~  \  ~~~  T7—  COS  8 


r,r 


1  +  P 


>/a9P/9r\ 
\  3P/3r  ) 


-  4rGp 


W  38  /3r  \ 

Mp  — - )  *  -  29  4-  it 

\  3P/3r  J  r 


(56) 


(57) 


combined  with  the  solution  of  the  relativistic  trace  equation  (2)  which  links 
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0p  and  P  to  density  and  temperature  for  gases,  solids  or  Fermi  liquids  with  in¬ 
ternal  phase.  Equation  (56)  is  the  combined  equation  that  arises  from  the  fol¬ 
lowing  equilibrium  equation28 


3P/3r  cos  6 


r,r 


1  +  P 


30p/3r 

3P/3r 


) 


1/2 


-  GMp/r2 


(58) 


and  the  relationship  of  mass  and  density  (which  will  be  treated  in  Section  3) 
given  by28 


cos  0  3M/3r 

r,r 


4irr  p 


(59) 


When  the  internal  phase  angles  are  set  to  zero  the  equilibrium  equation  (58)  re¬ 
duces  to  the  standard  result30 


3P/3r  -  -  GMp/r 

The  small  gradient  approximation  to  equation  (57)  is 


(60) 


30  /3r 

9P  +  P  3P/3r  "  "  28r 


'  (61) 


which  will  be  used  in  Section  5  for  approximate  solutions  for  9r  . 

Newton's  gravitational  law  can  be  written  for  spatial  coordinates  with  bro¬ 


ken  internal  symmetry  as 


28 


g  =  -  GM/r2 


(62) 


where  g  »  complex  number  acceleration  of  gravity.  The  measured  acceleration  of 
gravity  is  given  by  the  real  value  of  equation  (62)  as  follows^8 


g  *  -  GM/r  cos(20  ) 
m  r 


(63) 


Written  in  terms  of  the  measured  radial  coordinate  given  by  equation  (17)  gives' 


g  ■  -  GM/r2  cos (29  )  cos2  9 
m  m  r  r 


(64) 


These  formulas  are  valid  for  spherical  masses.  The  conventional  value  of  the 
acceleration  of  gravity  is  expressed  in  terms  of  the  measured  radial  distance 

„  2  3 

as 


g  ”  -  GM/r2  *  -  GM/r2  cos  2  9 
era  r 


and  therefore 


23 


(65) 
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(66) 


g  -  g  -  GM/r2[l  -  cos(20  )  cos2  0  ] 
mem  r  r 

~  302GM/r2 
r  m 

The  derivatives  of  go  and  gc  with  respect  to  r  are  given  by28 

3gm/3r  -  2GM/r3  [cos(20r)  +  r30r/3r  sin(2©r)]  -  4ttGp  cos(20r> 

3g  /3r  *  2GM/ (r3  cos2  9  )(1  -  tan  9  r30  /3r)  -  4rGp/cos2  9 
c  r  r  r  r 

Then  a  parameter  D  can  be  defined  given  by28 

3g  /3r  -  3g  /3r  3g  /3r  -  3g  /3r 

_  mm  cm  m  c 

38c/3rm  "  3gc/3r 

_  A  +  B 
C 

where 

A  -  [cos(20r)  cos2  0^  -  l](l  -  2irpr3/M) 

B  *  r30  /3r[sin(20  )  cos  9  +  tan  0  ] 

r  r  r  r 

C  *  1  -  tan  0^.  rSO^/Sr  -  27rpr3/M 

For  small  9r  the  parameter  D  can  be  written  as“3 

D  *  -  302(1  -  n) 

r/0  30  /3r 

r  r 

n  =  - — 

1  -  2rpr  / M 


(67) 

(68) 

(69) 


(70) 

(71) 

(72) 

(73) 

(74) 


From  equation  (64)  it  follows  that  for  spherical  bodies  the  Newtonian  law 
of  gravitation  in  space  with  broken  internal  symmetry  requires  a  coordinate  de¬ 
pendent  effective  gravitational  constant  given  by 

G  *  G  cos(29  )  cos2  0  (75) 

r  r'  r 

For  small  values  of  0r  equation  (75)  becomes 
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(76) 


G  -  G(1  -  3 62  +  •••) 
r  r 

Therefore  for  space  with  broken  internal  symmetry  Gr  <  G  »  and  G  represents  the 
ideal  case  of  gravitation  in  totally  symmetric  space  (0r  -  0)  .  For  a  homoge¬ 
neous  spherical  planet  or  star  the  internal  phase  angle  of  the  radial  coordinate 
will  be  a  function  of  the  radial  coordinate  magnitude  9r  *  9r(r)  and  is  obtained 
as  a  solution  to  the  coupled  gravitational  equilibrium  (56)  and  (57).  In  gener¬ 
al  however  0r  ■  0r(r,4»,$)  for  an  inhomogeneous  body  such  as  the  earth,  and  there¬ 
fore  Gr  will  depend  on  latitude  and  longitude.  Equations  (64)  and  (75)  are  valid 
for  both  the  interior  and  exterior  of  a  spherical  planet. 

From  equation  (75)  it  follows  that 


3G  / 3r  -  -  GE30  /3r  (77) 

r  r 

dG^/dip  -  -  GE30r/3i/»  (78) 

3Gr/3<|>  -  -  GE30r/3<f>  (79) 

where 

E  -  sin (20  ) (4  cos2  0-1)  (80) 

r  r 


For  small  9r  ,  E  'v  60r  .  In  Section  5  it  is  shown  that  0r  <  0  and  30r/3r  >  0 
for  idealized  planets  so  that  3Gr/3r  >  0  .  From  equations  (77)  through  (79)  it 
follows  that 

r/Gr3(V3r  “  "  2Hr30r/3r  (81) 

i(»/Gr 3G  /3i|>  -  -  2H^aer /9i^  (82) 

<t/Gr3G  /3<()  =  -  2114)36^/34) 

where 

'  H  *  tan (29  )  +  tan  0  (84) 

r  r 

For  small  9r  ,  H  'v  30r  . 

It  is  the  variation  of  the  acceleration  of  gravity  of  the  earth  that  is  de¬ 
termined  in  gravity  measurements,  and  it  is  important  to  have  a  measure  of  the 
difference  between  the  rates  of  change  of  gm  and  gc  with  respect  to  radial  dis¬ 
tance.  One  such  measure  is  given  in  equation  (69) .  Another  measure  might  con¬ 
sider  the  difference  of  the  normalized  rates  of  change  of  the  acceleration  of 
gravity  as  follows 


646 


(85) 


r/g  3g  / 3r  -  r/g  3g  /ir 
n  m  m  c  c 

2  “  r/gc9gc/3r  “™"~” 

r~/8  3gm/3r  ~  r„/g  3g„/3r„ 
in  m  m  in  m  c  c  m 

r  /g  3g  /3r 
m  c  c  m 

From  equation  (63)  it  follows  that 

r/g  3g  /3r  ■  -  2[l  +  tan(29  )r30  /3r  -  2irr^p/M] 
mm  r  r 

while  from  equation  (65)  it  follows  that 

r/g  3g  /3r  *  -  2[l  -  tan  0  r30  /3r  -  2irr^p/M] 

c  c  r  t* 

Then 

Hr30  /3r 

d2  - 1 - 3 - 

1  -  tan  9  r30  / 3r  -  2irr  p/M 

r  r 

where  H  is  given  by  equation  (84).  For  small  values  of  0r 


(86) 


(87) 


(88) 


D2  'v  +  30^n  (89) 

where  n  is  given  by  equation  (74)  . 

3.  MEASUREMENT  AND  GEOMETRY  OF  SPACE  AND  TIME.  It  has  been  assumed  that 
the  complex  number  space  and  time  coordinates  are  Euclidian  and  that 


-2  ,  -2  -2 
x  +  y  =  r 

(90) 

2  -  2  - 
sin  4>  +  cos  <j>  *  1 

(91) 

tan  <b  *  y/x 

(92) 

x  *  xe^  ^ 

(93) 

j  9V 

y  *  yeJ  ” 

(94) 

£  *  <pe^  ^ 4> 

(95) 

sin  £  =  S<^ 

(96) 

547 


(97) 


cos  0  ■  C.e 
9 

where  0  ■  complex  number  azimuthal  angle,  and  where28 

S,  *  [sin^(0  cos  0.)  +  sinh^(0  sin  0.)]^^ 

9  9  9 

C.  «  [cos^(9  cos  0.)  +  sinh^(0  sin  0 
9  9  9 

tan  0  ,  *  cot(0  cos  0.)  tanh(0  sin  0.) 

S<f>  (p  <p 

tan  0  ,  »  tan(9  cos  0.)  tanh(<|>  sin  0.) 
c0  0  0 

The  component  equations  of  equation  (90)  determine  r  and  0r 

2  2  2 
x  cos (20  )  +  y  cos  (20  )  *  r  cos(20r) 
x  y 

x^  sin(2©x)  +  y^  sin(20^)  =  r^  sin^O^.) 

while  the  component  equations  of  equation  (91)  are 

cos (20  J  +  C \  cos (20  .)  -  1 
9  s0  9  c0 

sj”  sin (20  )  -  sin(20  )  ■  0 

9  s9  9  c0 

Equations  (90)  through  (105)  also  give 

yc*  ■ 

9+0-e-e 
s0  c9  y  x 

S*  '  »i«<28ct)/sln[2(9c<i  +  0^>] 

C*  -  3lr,<2e^)/sit.[2(0c!ti  +  9^)] 

The  measured  coordinates  and  angles  are  given  by28 


X 

m 

X 

cos 

0 

X 

ym 

M 

y 

cos 

9 

y 

r 

m 

at 

r 

cos 

0 

r 

♦n 

tn 

* 

<fr 

cos 

% 

(98) 

(99) 

(100) 

(101) 

and  are  written  as 

(102) 

(103) 

(104) 

(105) 

(106) 

(107) 

(108) 

(109) 

(110) 
(111) 
(112) 
(113) 
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Substituting  equations  (110)  through  (112)  into  equations  (102)  and  (103)  shows 
that 


2  2,  2 

x  +  y  t  r 
m  m  m 


(114) 


which  indicates  that  in  a  broken  symmetry  system  the  measured  coordinates  are 
non-Euclidian . 

A.  Length  of  Curves 

The  length  of  a  curve  in  complex  number  space  is  given  by 
L  *  Le2^  ■  J[r^  +  (df/d^)^]^^  d$  (115) 


where 


—  = 

sec  0  , 

r,r  dr 

(116) 

—  =  e  r 

dq> 

sec  0.  ,  dq> 

r»r 

f  A  =  0  + 

r<J>  r 

0  -  0  -  0 
r,r  <p  <p,<i> 

(117) 

sec  0  =■ 

r  ,r 

7  1/2 

[1  +  (r30r/3r)Z] 

(118) 

sec  0.  ,  = 
<P»  <P 

r  2,1/2 

[l  +  (QdQ^/dQ)  ] 

(119) 

The  measured  length  is  given  by 

L  =  L  cos  0T 
m  L 

For  a  circle  with  3 r / 9 <b  =  0  the  complex  number  length  is 
2 *  jOr+e^+6^) 

L  =  r  e  •  sec  0,  ,  d<j> 

J.  4>»4> 


(120 


(121) 


If  in  addition  0  ^  and  0^  are  independent  of  <p  equation  (121)  becomes 

L  =  2mrej(9r+e.p  (122) 

L  *  2iir  (123) 


0=0 

L  r 


The  measured  circumference  is 


L  =*  L  cos  0T  =  2-n-r  cos(0  +  0  ) 
m  L  r  1 


(124) 


(125) 
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Note  that  If  8^  *  0 


L  -  2irr 


(1-26) 


0-9 
L  r 


(127) 


L  -  2irr  cos  0  -  2irr 

m  r  m 


(128) 


which  is  the  result  obtained  in  Reference  28. 


B.  Area  of  a  Plane  Curve. 


The  area  enclosed  by  a  plane  curve  in  complex  number  space  is 


A  -  j  /r2d£  -  j  Jr2e^  ^20r+0<j>)  (d<j>  +  j <pd0  ) 


(129) 


1  ,  2  j  (20r+0(j)+8(fr  a) 

=  t  jr  e  v  v  sec  8 


For  a  circle  with  r  -  constant 


A  -  V  Je^29^9^*^  sec  8, 


(130) 


If  0r  and  0.  are  constants 


A  =  Aej9A»,r2eJ(29r+e^ 


(131) 


and  therefore 


A  =  irr 


(132) 


9  -  29  +9 

A  r  c 


(133) 


A  -  nr2  cos(29  +  9.)  =  ir^  cos(29  +  0  )/cos“  9 
m  r  qi  m  r  $  i 


(134) 


where  A  -  measured  area.  Finally,  if  9a  =  0 
m  * 


A  -  itr 


(135) 


9  -  29 

A  r 


(136) 


A  -  Trr2  cos(29  )  -  nr2  cos(20  )/cos2  0 
m  r  m  r  r 


(137) 


which  is  the  result  obtained  in  Reference  28. 
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C.  Area  of  Surface. 
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Finally,  if  8^  -  0 

S  -  4irr2ej29r 

(149) 

S  -  4irr2 

(150) 

6S  "  20r 

(151) 

S  ■  47rr^  cos(29  ) 
m  r 

(152) 

2  2 
■  4irr  cos (28  )/cos  8 
m  r  r 

From  equations  (62)  and  (147)  it  follows  that  Gauss's  law  for 
nates  with  broken  internal  symmetry  is  given  by 

spatial  coordi- 

JgdS  ^  -  2TrGMe‘'0<f>[l  -  cos  (ire*' 9^)  ] 

(153) 

I  assuming  8^  and  8,p  are  constants  (which  can  only  be  a  crude  approximation) .  1 

D.  Volume 

The  volume  contained  within  a  closed  surface  is  written 

in  complex  number 

spherical  polar  coordinates  as 

2  _  _ 

V  =  Jr  sin  ip  dip  dp  dr 

(154) 

*  fr2S,  sec  6,  ,  sec  3,  ,  sec  8  dip  dd>  dr 

J  P  ip.ip  p,p  r ,  r 

where 

-  38  +6  +8+8+6,  ,  +  8A  +  8,  . 

V  r  r,r  sip  ip  ip,ip  <p  p,p 

(155) 

If  and  8<j,  are  constants 

V  -  (9sip+0ip+0<p)  Jr2S^  sec  6r  r  ej(39r+9r’r)  dip  d<p  dr 

(156) 

For  8 ,p  and  8^  =»  0 

V  *  Jr2  sin  ip  sec  3  ej (2®r+®r , r)  dip  dip  dr 

r  ,r 

(157) 

For  r  and  8r  independent  of  tp  and  p  (a  sphere) 

u  /  r  2  a  j (38r+Br  r)  , 

V  =  4ttI  r  sec  6  e  r  r>r  dr 

;  r,r 

(158) 
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The  component  parts  of  equation  (158)  are 

2 

V  cos  9V  -  4ir/r  sec  8r>r  cos(39r  +  6r  r)dr  (159) 

2 

V  sin  9V  -  4  it  Jr  sec  6r>r  sin(39r  +  8r>r)dr  (160) 

which  determines  V  and  9V  for  a  sphere  in  a  gravitational  field.  For  0  -  con 

scant  r 


V  -  ej30r  4r/3r3  =■  4ir/3f3 

V  -  4ir/3r3 
9V  -  39r 

Vm  “  ^Tf/3r3  cos(30  )  =  4ir/3r3  cos(39  )/cos3  0 
m  r  m  r  r 


(161) 

(162) 

(163) 

(164) 


E.  Density 

The  rest  mass  of  a  body  does  not  have  an  internal  phase  because  it  is  in¬ 
variant  under  the  effects  of  the  basic  trace  equation  (2). 2 9  The  instantaneous 
density  is  given  by 

P  =  pe^ 9(3  =  dM/dV  =  cos  g  dM/dV  e”^9v+6v»v^ 

V ,  V 

where 

can  3V>V  =  Vd0v/dV 
Therefore 


(165) 

(166) 


p  =  cos  3V  v  dM/dV 


9V  8V,V 


Combining  equations  (165)  and  (L54)  gives  the  following  results  for 


(r  S  sec  6,  ,  sec  8, 


- 


sec  3  ) 

r ,  r 


■I  dM 


d^  d$  dr 


where  is  given  by  equation  (155).  For  radial  symmetry  equations 
( 165)  give 


(167) 

(168) 

the  density 

(169) 

(170) 

(158)  and 
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(171) 


cos  3 


p  *v 


r,r  3M 
3r 


4irr 

6  ^  -  36  -  3 


r,r 


(172) 


Equation  (171)  combined  with  the  following  stellar  equilibrium  equation 


for  broken  symmetry  matter 


28 


«  •  -  r L  *°a  6r,r  C1  +  (P  ^7^>2]l/2 


(173) 


r23P/3r 

Gp 


cos  3  sec  3n 
r,r  P,r 


gives  the  combined  stellar  equilibrium  (56)  for  ordinary  stars.  The  angle  Bp  r 
that  appears  in  equation  (173)  is  defined  by 


tan  3t 


30p/3r 


P,r  3P/3r 

The  measured  density  is  given  by 

p  *  p  cos  0 
m  p 


(174) 


(175) 


For  a  relativistic  interacting  system  having  a  complex  number  internal  en¬ 
ergy  U  ,  the  mass  is  given  by  M  -  U/c2  and  the  instantaneous  density  is* 


Pr  =  c"2  dU/|dV|  =-  dM/jdVj 
Combining  equations  (158)  and  (176)  gives 


cos  3  ?  2  l/2 

*  - LiL  [ (3M/3r) L  +  (M30M/3r)  ] 

4irr 


9  -v  0  +  3tt 

pr  U  U,r 


where  9M  =  .  and  Bu>r  is  given  by 

tan  3 


(176) 

(177) 

(178) 


"u  ,  3U  r  "m  /  3M 
U,r  *  3r  ^  3r  1  3r  3r 


(179) 


If  special  relativity  is  included  the  pressure  adds  to  the  internal  energy 
density,  and  the  inertial  mass  density  becomes3 


_  2 

?I  ■  Pr  +  P/c 


(ISO) 
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while  the  gravitational  mass  density  is 


PG  ”  Pr  +  3P/c2  (181) 

Equations  (180)  and  (181)  can  be  simplified  by  combining  them  with  equation  (158). 
It  should  be  pointed  out  that  the  complex  number  values  for  coordinates  is  also 
suggested  by  string  theory.34 

4.  NEWTONIAN  GRAVITY  FOR  NONSPHERICAL  MASSES  WITH  BROKEN  INTERNAL  SYMMETRY. 


A.  Complex  Number  Gravitational  Potential. 

By  analogy  to  the  standard  scalar  form  of  the  gravitational  potential  for  a 
nonspherical  body,  the  following  expression  for  the  complex  number  gravitational 
potential  for  a  nonspherical  body  existing  in  space  with  broken  internal  symme¬ 
tries  is  postulated3 3-3 9 


V  -  -  GM/r  [  1  -  V  I  (a/r)nP  (cos  IF)]  (182) 

-  n  n 

n-2 

where  V  *  complex  number  potential,  a  *  complex  number  equatorial  radius,  r  = 
complex  number  radial  coordinate  of  a  point  outside  of  the  body,  In  *  complex 
number  coefficients,  and  Pn(cos  »  complex_number  Legendre  polynomials  corre¬ 
sponding  to  the  complex  number  zenith  angle  ljj  .  The  complex  number  quantities 
appearing  in  equation  (182)  can  be  written  as 


V  » 

Vej9V 

(183) 

a  = 

aej9a 

(184) 

I 

=  I  ej9In 

(185) 

n 

n 

P 

n 

=.  P  eJ0Pn 
n 

(186) 

where,  for  instance,  Pn  and  9pn  ■  magnitude  and  phase  angle  of  the  complex  num¬ 
ber  Legendre  polynomials.  The  real  and  imaginary  parts  of  equations  (182)  are 
given  by 


V  cos  9V  *  -  ~  [cos  9r  -  I2P2(a/r)2  cos(9I2  +  9p2  +  29^  -  39^)  -  •••]  (187) 

V  sin  9y  -  Si  [sin  9r  +  I2P2(a/r)2  sin(9I2  +  9p2  +  28g  -  39r)  +  •••]  (188) 

Equations  (187)  and  (188)  can  be  used  to  determine  V  and  9y  .  For  instance 

sin  9  +  I  P  (a/r)2  sin(9T,  +  9  -(-  29  -  39  )  +  ••• 

tan  9  - - = -  (139) 

cos  9^  -  I2P2(a/r)  cos(9I?  +  9p2  +  29^  -  39r)  -  ••• 
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2 

while  squaring  and  adding  equations  (187)  and  (188)  gives  V  .  In  the  limit 
a/r  •+•  0  equations  (187)  and  (188)  become 

V  -  V  -  V  cos  0„  -  -  GM/r  cos  0  -  -  GM/r  cos2  0  (190) 

mR  V  rmr 

V  -  V  sin  0tT  ■  GM/r  sin  0  -  GM/r  sin  0  cos  0  (191) 

■iv  r  m  r  r 

which  corresponds  to  a  point  mass  whose  complex  number  potential  is  given  by 

V  -  -  GM/f  (192) 

The  various  terms  in  the  gravitational  potential  will  now  be  considered. 

B.  Complex  Number  Legendre  Polynomials. 

The  appearance  of  Pn(cos  ijl)  in  equation  (182)  needs  some  explanation.  Fol¬ 
lowing  the  standard  prescription  for  obtaining  Legendre  polynomials  for  scalar 
angles,  the  following  generalizations  to  complex  angles  are  given1*0 


P  -  1 

o 

P  ^  *  cos  ip 

*  1/2(3  cos2  ij;  -  1) 

P^  *  1/2(5  cos2  ij;  -  3  cos  ij;) 

P^  *  1/8(35  cos^  ij;  -  30  cos2  ij;  +  3) 

5  -  3  -  - 

P,.  *  1/8(63  cos  p  -  70  cos  p  +  15  cos  p) 

P,  =  1/16(231  cos^  ip  -  315  cos^  j;  +  105  cos“  p  -  5) 
o 

where20 

Ip  m  (J/e^  ^P 

cos  ip  m  C^e 

C,  =  [cos^(i|;  cos  0.)  +  sinh^(ip  sin  S,)]^2 
'p  ip  P 

tan  0  *  tan(<p  cos  0^)  tanh(i(;  sin  0^) 

From  equations  (186)  and  (194)  it  follows  that 


P,  ’  c. 

1  p 


d  =  -  0 
Pi  Clp 


(193) 

(194) 

(195) 

(196) 

(197) 

(198) 

(199) 

(200) 
(201) 

(202) 

(203) 

(204) 

(205) 
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Equations  (186)  and  (195)  give 

P2  cos  0p2  -  l/2[3cj  cos(20cii))  -  1] 


P2  Sin  ®P2  "  ‘  3/2  %  sin(2V 


which  gives 


A  /r.2 _ ,„a  v  .  ,  il/2 


p2  -  l/2[»cj  -  6Ct  no.  (29^)  +  1] 


tan  0 


-  3C,  sin(20  ,) 
air 


P2  2 

3C"  cos (20  ,)  -  1 

<|)  '  Cl|) 

From  equation  (186)  a*'d  (196)  it  follows  that 


P,  cos  0__  -  1/2[5C,  cos(30  ,)  -  3C.  cos  0  1 

3  P3  P  cp  P  c’J) 


p3  sin  ep3  -  1/21-SC-  sinOP^)  +  3Cf  sin  9^1 
which  gives 

P,  -  1/2[25C^  -  30C^  cos(20  ,)  +  9C2]1/2 
3  <p  ip  cp  ip 


tan  0. 


and  so  on  for  Pn  and  9pn  . 

The  complex  number  Legendre  polynomials  can  also  be  written  in 
complimentary  angle  x  which  is  defined  by 

sin  x  3  cos  £ 

cos  x  3  sin  P 

mm  2  8 

where  x  3  complex  number  latitude  and 

sin  x  3  S  e-'9sX 

X 

«  —10 
cos  x  3  C  e  J 

X 

and  where 


-  5C3 

'P 

sin(30  )  +  3C . 
cw 

sin  9 

cip 

5C3 

P 

cos(30  , )  -  3C 
cp  -p 

COS  0  , 

C  Ip 

(206) 

(207) 

(208) 

(209) 


(210) 

(211) 

(212) 

(213) 


cos  x  3  C  e 

X 


terms  of  the 

(214) 

(215) 

(216) 

(217) 
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(218) 


S  -  [sin  (x  cos  0  )  +  sinli  (x  sin  0  )] 

^  A  A 


C  -  [cos  (x  cos  9  )  +  sinn  (x  sin  0  )] 

A  A  A 


1/2 

1/2 


(219) 

(220) 
(221) 

The  defining  relations  given  by  equations  (214)  and  (215)  can  be  written  as 


tan  9  -  cot(x  cos  9  )  tanh(x  sir.  9  ) 

sx  X  X 

tan  9  -  tan(x  cos  0  )  tanh(x  sin  9  ) 

CX  X  X 


s  -  c. 

X  ♦ 

(222) 

c  -  s. 

X  'P 

(223) 

0  -  -0  . 
sx  cip 

(224) 

9  -  -8 

s<|>  cx 

(225) 

which  relate  x  and  0^  to  p  and  8^  . 

Combining  equation  (214)  with  equations 

(193)  through  (199) 

gives 

*o  *  1 

(226) 

?l  -  sin  x 

(227) 

?2  *  1/2(3  sin2  x  -  U 

(228) 

=*  1/2(5  sin2  x  -  3  sin  x) 

(229) 

? ^  =  1/8(35  sin4  x  -  30  sin2  x  +  3) 

(230) 

P^  =  1/8(63  sin2  x  -  70  sin2  x  +  15  sin 

X) 

(231) 

P^  *  1/16(231  sin^  x  ~  315  sin4  x  +  105 

.  2  -  .  .. 
sxn  x  -  5) 

(232) 

Note  also  that 

P,  -  s 

1  X 

(233) 

9_,  *  9 

PI  sx 

(234) 

P0  =>  1/2[9S4  -  6S2  cos(29  )  +  l]1/2 
2  X  X  sx 


tan  9 


]SX  3ln(293x> 

P2  ’  3sx  -  1 


(235) 


(236) 
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(237) 


P 


3 


tan  6 


1/2[25s£  -  30S*  cos(29gX 

5SX  sl"(3V  - 3S, 
”  5sx  “8(3V  -  3sx 


2-,  1/2 
+  9sx] 


Sln  9sx 


cos  9 

sx 


(238) 


Consider  now 
equator  equations 

the  special  case  of 
(193)  through  (238) 

the  equator  and  the  north  pole 
give 

x  -  o 

ip  -  ir/2 

9  (tt/2)  -  0 

0cX(o)  -  0 

9SX(°)  -  ex(0) 

V”/2)  -  o 

V*/2)  “  ' 

ex(°) 

Sx(0)  -  0 

cx(°)  -  i 

y*/2)  - 1 

C^(ir/2)  =*  0 

p  -  1 
o 

9po  =  ° 

p  -  l 

O 

P1  *  0 

9pi  “  9x(0) 

p1  =  0 

P2-“  1/2 

0p2  =  * 

?2  ■  -1/2 

p3  =  0 

9  =9  (0) 

P3  ’ 

p3  *  0 

p,  -  3/8 

•4 

9  _ .  »  0 
p4 

p4  =  3/8 

At  the 

(239) 

(240) 

(241) 

(242) 

(243) 


(244) 


where  9V(0)  =  value  of  0Y  at  the  equator. 

X  x 

At  the  north  pole  the  following  relationships  are  obtained  from  equations 
(193)  through  (238) 


X  *  tt/2 

=  0 

0X  (tt/2)  -  0 

(245) 

V0)  =  0 

%(0)  =  %(0) 

0gXOr/2)  =  0 

(246) 

0cX(tt/2)  -  - 

V°> 

(247) 

S. (0)  -  0 
r 

V0)  -  1 

(248) 

S  x  (  TT  /  2  )  -  1 

Cx(tt/2)  =■  0 

(249) 
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1 


0 


1 


p  - 

o 


P 


I 


1 


Po 


6P1  -° 


p 

o 


p 


2 


1 


P2 


0 


P3“  1 


P4  “  1 


0P3-° 


9_.  -  0 
P4 


(250) 


where  9^(0)  *  value  of  9^  at  the  north  pole. 

Equations  (214)  and  (215)  are  valid  for  complimentary  angles.  Combining 
these  conditions  with  equations  (218)  through  (221)  shows  that  if  x  3  0  then 
t|(  ■  ir/2  and  if  m  0  then  x  *  ^/2  so  that 


9^/2)  -  0  (251) 

6x(ir/2)  -  0  (252) 


as  indicated  in  equations  (239)  and  (245) .  This  shows  that  for  complementary 
angles 


+  X  31  tt/2  (253) 

and  the  right  angle  n/2  is  not  associated  with  an  internal  phase  angle.  The 
component  parts  of  equation  (253)  are  given  by 

•p  cos  9  +  x  cos  9  =  tt/2  (254) 

y  A 

■p  sin  9  +  x  sin  9X  =  0  (255) 

Equation  (254)  states  that 

P_  +  x_  *  */2  (256) 

m  m 

which  states  that  the  sum  of  the  measured  complementary  angles  is  equal  to  tt/2. 

The  angle  tt/2  apparently  is  the  only  angle  whose  internal  phase  angle  is 
zero,  all  other  angles  exhibit  an  internal  phase.  Consider  an  angle  p  which 
is  composed  of  two  component  parts  ^  and  p 2  so  that 

P  cos  9,  *  p,  cos  9  ,  +  p„  cos  9  _ 

p  L  pi  2  \p2 

p  sin  9 ,  *  sin  9 ,  ,  +  p~  sin  9 ,  - 
p  1  <^l  2  <p2 


(257) 

(258) 

(259) 
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Equation  (258)  states  that 

*m  ’  *1.  +  *2m  <260> 

which  agrees  with  reality  in  that  the  measured  total  angle  is  the  sum  of  each 
measured  part.  For  the  special  case  of  ir/2  ,  equations  (254)  and  (255)  give 

■  ir/2  sin  0^  /(cos  0^  sin  ©x  -  sin  0^  cos  0^)  (261) 

X  *  -  ir/2  sin  0  /(cos  0  sin  0  -  sin  0  cos  0  )  (262) 

T  T  A  T  A 

Alternatively  equations  (254)  and  (255)  can  be  written  as 

cos  8  -  (1  -  XW  sin2  0X)1/2  (263) 

22  2  1/2 

iKl  -  X  h  sin  ©x)  +  x  cos  @x  =  ir/2  (264) 

Figure  1  shows  the  variation  of  0^  and  . 

C.  Complex  Number  Gravity  Potential  Coefficients  In  . 

The  dimensionless  complex  number  coefficients  In  that  appear  in  equation 
(182)_describe  the  distribution  of  mass  within  a  planet.  Equation  (182)  shows 
that  IQ  =  1  and  =  0  because  the  origin  of  coordinates  can  be  located  at  the 
center  of  mass  of  the  planet.  For  practical  calculations  only  the  second  or¬ 
der  coefficient  I2  is  retained.  The  value  of  I2  is  obtained  as  an  obvious  gen¬ 
eralization  of  the  standard  scalar  form  for  this  coefficient  as  follows35 

l2  =  I2ej012  «  (C  -  A)/(Ma2)  (265) 

where  C  ■  complex  number  moment  of  inertia  about  the  polar  axis,  and  A  =  com¬ 
plex  number  moment  of  inertia  about  one  of  the  transverse  axes.  If  the  z  axis 


is  taken  to  be  the  polar  axis35 

C  *  Cej&c  =  /(x2  +  y2)dM  (266) 

A  *  Ae20^  =  j  (x^  +  z2)dM  =  J  (y2  +  £2)dM  (267) 

The  real  and  imaginary  parts  of  equation  (265)  are 

I-  cos  0T„  ■  l/(Ma2)[c  cos(0r  -  20  )  -  A  cos(0  -  20  )]  (268) 

t  I4  C  <3.  A  cL 

I-  sin  0T_  =  l/(Ma2)[c  sin(0  -  29  )  -  A  sin(9  -  29  )]  (269) 

Z  i.  Z  L »  3L  A3 

From  equations  (172)  and  (173)  it  follows  that 
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(270) 


I2  -  l/(M2a4)[c2  +  A2  -  2AC  cos(0c  -  0A)] 


tan  0 


12 


C  sin(0  -  20  )  -  A  sin(0  -  20  ) 

_ C _ 3 _ A  cl 

C  cos(0  -  20  )  -  A  cos(0  -  20  ) 

^  3  As 


(271) 


The  measured  values  of  l2>  C  and  A  are  given  by 


hm  -  h  cos  SI2 


C  =  C  cos  0_ 
m  C 


A  =■  A  cos  0, 
m  A 


(272) 

(273) 

(274) 


D.  Rotational  Effects  of  a  Gravitating  Planet  for  Space  and  Time 
with  Broken  Internal  Symmetries. 

For  a  rotating  planet  (or  star)  the  total  potential  also  includes  a  rota¬ 
tional  term,  and  is  written  as  an  obvious  generalization  to  the  standard  scalar 
form  as  follows35-39 


U  =  -  GM/r  +  GM/rl2P2(a/?)2  -  1/2  r2w2  cos2  x 


(275) 


where  U  *  total  complex  number  potential  and  3  =  complex  number  angular  speed. 
The  cc 
blems . 


The  complex  number  angular  speed  has  already  been  considered  in  mechanical  pro¬ 
blems.2  Combining  equation  (275)  with  equations  (216),  (217)  and  (228)  gives 


U  =  -  GM/r  +  GM/rI2 (a/r) 2  1/2 (3S2e2j 9sx  -  1)  -  1/2  r 2^2C2e'2j 9cX  (276) 

The  total  potential  can  be  evaluated  at  the  equator  \  =  0  and  at  the  north 
pole  x  =  1T/2  as  follows 


U  =  -  GM/a  -  GMI./(2a)  -  1/2  a2u2 

O  4 

U  -  -  GM/c  +  GM52I_/c3 
o  2 


equator 
north  pole 


(277) 

(278) 


where  S,  (0)  =  0  was  used  to  obtain  equation  (277),  and  S^(^/2)  =  1  and 

9s^(tt/2)  =  0  were  used  to  obtain  aquation  (278),  and  where  c  =  ceJ'c  =  complex 
number  radius  at  the  poles.  The  potentials  in  equations  (277)  and  (278)  must 
have  the  same  value  for  the  geoid  so  that  the  complex  number  flattening  is  given 
by  the  following  generalization  of  the  standard  scalar  result 

f  =  fej9f 


3  5 


(a  -  c)/I  -  3/21 2  +  32 I3/(2GM) 


(279) 


where  it  is  assumed  that  a  ^  c  to  obtain  this  equation.  From  equation  (279) 
it  follows  that 
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(280) 


f  cos  0,  *  3/2  I_  cos  9t_  +  <o2a3/(2GM)  cos(26  +  30  ) 
t  2  12  to  a 

f  sin  0.  -  3/2  I~  sin  0_.  +  to2a3/(2GM)  sin(20  +  30  )  (281) 

E  Z  1Z  Cl)  a 

These  equations  determine  f  and  8^  -  The  measured  value  of  the  flattening  is 
fm  -  f  cos  9f  . 

The  real  and  imaginary  parts  of  equation  (275)  are 

U  cos  8..  *  -  GM/r  cos  0  +  GMI0P_a2/r3  cos(0T.  +  0__  +29  -  36  )  (282) 

U  r  2  2  12  P2  a  r 

-  1/2  r2ui2C2  cos (20  +  20  -  20  v) 

X  r  to  cX 

U  sin  0IT  -  GM/r  sin  9  +  GMI.P.a2/r3  sin(0T_  +  0_o  +  20  -  36  )  (283) 

U  r  2  2  12  P2  a  r 

-  1/2  r2to2C2  sin(29  +  20  -  20  ) 

X  r  to  cX 

Equivalently  equation  (276)  can  be  used  to  write 

U  ccs  0  T  =  -  GM/r  cos  0  +  3/2  GMI„a2S,2/r3  cos(0T„+29  +20  -30  )  (284) 

U  r  2  X  12  sX  a  r 

-  1/2  GMI„a2/r3  cos(0T„+29  -30  )  -  1/2  r2co2C2  cos(20  +29  -20  ) 

z  iz  a  r  x  lrcjcx 

U  sin  0TT  =*  GM/r  sin  9  +  3/2  GMI„a2S“/r3  sin(9T„+29  +29  -39  )  (285) 

U  r  2.<  12  sx  a  r 

-  1/2  GMI„a2/r3  sin(0TO+29  -30  )  -  1/2  r2o2C“  sin(29  +29  -29  v) 

2  12  a  r  (  r  o  cX 

E.  Acceleration  of  Gravity 

The  acceleration  of  gravity  of  a  rotating  planet  is  ob^'ined  from  equa¬ 
tion  (275)  by 

g  =  ge^S  =  -  3U/3r  (286^ 

=  -  GM/r-  +  3GM/f- (a/r )  “I,,^  +  rio“  cos-  < 

The  real  and  imaginary  parts  ot  equation  (286)  are 

g  cos  0g  =  -  GM/r2  cos(29r)  +  3GMa2/r^I0P2  cos (0l7+0p2+29a-46jJ  (287) 

+  rto2C2  cos  (9  +29  -29  ) 

X  r  o  cx 
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g  sin  9g  -  GM/r2  sin(20r)  +  3GMa2/r4I2P2  sin(0-I2+0p2+20a-40r)  (288) 

+  ru)2C2  sin(0  +20  -20  J 
X  r  a)  cX 

Equation  (286)  can  also  be  written  as 

g  -  -  GM/f2  +  3GM/r2(I/?)2I2  1/2 (3S2e2j 0S*  -  1)  +  ?m2C2e'2j0cX  (289) 

and  therefore 

g  cos  0  -  -  GM/r2  cos (20  )  +  9/2  GMa2/r4I.S2  cos(9T,+20  v+20  -46  )  (290) 

-  3/2  GMa2/r4I-  cos(0T.+20  -49  )  +  rw2C2  cos(9  +20  -20  J 

2  12  a  r  X  v  r  oj  cX 

g  sin  9  =  GM/r2  sin(20  )  +  9/2  GMa2/r4I,S2  sin(0T_+20  v+20  -40  )  (291) 

5  t  Z  a  1Z  SX  3.  IT 

-  3/2  GMa2/r4I0  sin(9T.+20  -40  )  +  ru)2C2  sin(0  +20  -20  J 

2  12  a  r  X  r  u  cX 

The  measured  acceleration  of  gravity  is  given  by  g  cos  0g  .  Equations  (287) 
and  (288)  or  (290)  and  (291)  can  be  used  to  determine  g  and  0g  .  These  equations 
can  be  written  in  terms  of  measured  quantities  by  making  the  substitutions 


r  *  r  /cos  0 
m  r 

a  =  a  /cos  0 
m  a 

c  =  c  /cos  9 
m  c 

X  =  X  /cos  9 
m  a 


=  m  /cos  0 
ra  a) 


I0  =  It  /cos  0_„ 
2  2m  12 

f  =  f  /cos  0, 
m  f 


(292) 

(293) 

(294) 

(295) 

(296) 

(297) 


(298) 


where  rm  =  measured  radial  coordinate,  a^  =  measured  equatorial  radius,  cm  = 
measured  polar  radius,  Xm  =  measured  latitude,  a)m  =  measured  rotational  speed, 
l2m  =  measured  mass  distribution  coefficients  and  fm  =  measured  flattening. 
Substituting  equations  (292)  through  (298)  into  equation  (287)  gives 


g  ■  -  GM/r2  cos(29  )  cos2  9 
m  m  r  r 

+  (3GMa2I„  P„  cos4  9  )/(r4  cos2  9  cos  0TO)  cos(9  +9  +29  -40  ) 

IQ  Ztq  Z  IT  m  3  X  Z  X  P  Z  3  f 

+  (r  w2C2)/(cos  9  cos-  9  )  cos(9  +29  -29  ) 

m  m  X  r  u>  r  u>  cX 


(299) 
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Equivalently,  substituting  equations  (292)  through  (298)  into  equation  (290) 
gives  the  measured  acceleration  of  gravity  as 

gm  -  -  GM/r2  cos(28r)  cos2  ©r  (300) 

+  (9GMa2I  S2  cos4  0  )/(2r4  cos2  0  cos  0  )  cos(0T,+20  +20  -40  ) 

m  zm  x  r  m  a  LZ  1/  sX  a  r 

-  (3GMa2I0  cos4  0  )/(2r4  cos2  0  cos  0_.)  cos(0T.+20  -40  ) 
m  zm  r  m  a  Iz  12  a  r 

+  (r  m2C2)/(cos  0  cos2  0  )  cos (0  +20  -29„) 
m  m  X  r  or  r  w  X 

From  equations  (218)  and  (219)  it  follows  that  S  and  C  can  be  expressed  in 
terms  of  the  measured  latitude  by  '  ' 

S  =  [sin2  x  +  sinh2  (x  tan  0 ^j1^2  (301) 

X  m  m  X 

C  =>  [cos2  x  +  sinh2 (x  tan  0  )V~^2  (302) 

X  m  m  X 

For  small  0  it  follows  that 

A 

S  'v  sin  x_  (303) 

X  m 


CY  ^  cos  x  (304) 

a  m 

Frrm  equations  (220)  and  (221)  it  follows  that 

tan  0  =  cot  x  tanh(x  tan  0  )  (305) 

sX  m  m  x 

tan  0  „  =  tan  x  tanh(x  tan  8.,)  (306) 

cx  m  Ara  X 

F.  Acceleration  of  Gravity  on  the  Geo id 

The  approximate  shape  of  the  geoid  for  a  planet  with  broken  internal  sym¬ 
metries  is  written  as  a  simple  generalization  of  the  standard  scalar  expression 
as  follows3  5 

r  =  a(l  -  f  sin2  x)  (307) 

From  equation  (307)  it  follows  that 

r-2  =*  a-2 ( 1  +  2f  sin2  x  +  •••)  (308) 


Using  equation  (308)  in  the  first  term  of  equation  (286)  gives  for  the  accel¬ 
eration  on  the  geoid 

g  =»  -  GM/a2 ( l  +  2f  sin2  x)  +  3GM/a2J2P2  +  auj2  cos2  x  (309) 

The  real  and  imaginary  parts  of  equation  (309)  are  written  as 
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(310) 


g  cos  9  -  -  GM/a2  cos(29  )  -  2GMfS2/a2  cos(9,+29  -29  ) 

g  a  X  f  .sX  a 

+  3GMI-P0/a2  cos(9  0+9  -29  )  +  aw2C2  cos (9  +29  -29  ) 

2  2  12  P2  a  X  aw  cx 

-  -  GM/a2  cos (29  )  -  2GMfS2/a2  cos (9, +29  -29  ) 

3  X  t  sX  3 

+  9/2GMI2S2/a2  cos(9I2+29sX-29a)  -  3/2GMI2/a2  cos(9I2-29a) 

+  au2C2  cos(9  +29  -29  ) 

X  a  oj  cX 

g  sin  9  -  GM/a2  sin(29  )  -  2GMfS2/a2  sin(0,+29  -29  )  (311) 

8  3.  X  t  sX  3 

+  3GMI„P0/a2  sin(6  .+9  -29  )  +  aw2C2  sin(9  +29  -29  ) 

2  2  12  P2  a  X  aw  cX 

-  GM/a2  sin(29  )  -  2GMfS2/a2  sin(9c+20  -29  ) 

a  X  f  sX  a 

+  9/2GMI2S2/a2  sin(9I2+29s)(-29a)  -  3/2GMI2/a2  sin(9I2-29a) 

+  aw2C2  sin(9  +29  -29  ) 

X  a  w  cX 

Equations  (310)  and  (311)  can  be  written  in  terras  of  measured  quantities  by  using 
equations  (292)  through  (298).  The  measured  acceleration  on  the  geoid  is  given 
by  equation  (310) . 

The  acceleration  of  gravity  at  the  equator  is  obtained  by  using  equation 
(309)  with  equations  (239)  through  (244)  that  describe  the  equator,  with  the 
result  that 

ge  -  -  GM/a2  -  3/2GMI2/a2  +  a u2  (312) 

The  real  and  imaginary  parts  of  equation  (312)  give 

g  cos  9  =  -  GM/a2  cos(29  )  -  3/2GMI„/a2  cos(9T  -29  )  +  aw2  cos(9  +29  )  (313) 

e  ge  a  2  12  a  a 

ga  sin  9  -  GM/a2  sin(29  )  -  3/2GMI-/a2  sin(0T_-29  )  +  aw2  sin(9  +29  )  (314) 

e  ge  a  2  12  a  a  w 

from  which  ge  and  9ge  can  be  determined.  Combining  equations  (279),  (309)  and 
(312)  and  neglectirg  higher  order  terms  gives 

I  a  ge[l  +  (5/2m  -  f ) sin2  xl 


(315) 


(316) 


m  *  me39®  *  w2a2/  (GM) 


and  where  5  is  related  to  the  complex  number  flattening  f  by  equation  (279)  which 
can  be  rewritten  as 

f  -  3/2I2  +  m/2  (317) 

Equation  (315)  is  the  complex  number  generalization  of  Clairut’s  equation.35 
Equation  (315)  can  be  written  as 

g  cos  0g  -  ge  cos  0ge  +  5/2gemsJ  cos (6ge+0m+20gX)  -  ggfsj  cos(ege+0f+26gx)  (318) 

8  3in  9g  ’  ge  Sin  9ge  +  5/2gemSX  Sin(9ge+9m+29sx)  "  gefSJ  sin(9ge+9f+29sX)  (319) 
G.  Apparent  Non-Newtonian  Effects 


The  measured  acceleration  of  gravity  given  by  equation  (299)  has  to  be  com¬ 
pared  to  the  conventionally  calculated  acceleration  of  gravity  in  order  to  esti¬ 
mate  the  magnitude  of  the  discrepancy.  The  conventionally  calculated  accelera¬ 
tion  of  gravity  is  just  the  scalar  form  of  equation  (286)  in  which  the  measured 
distances  and  angles  appear  as  follows35 


c 

g 


-  GM/r2  +  3GMa2/r4I_  P. 
m  m  m  2c  2c 


.  2  2 
+  r  w  cos  x 
mm  Am 


(320) 


-  -  GM/r2  +  9/2GMa2/r4I„  sin2  X 
m  m  m  2c  m 


-  3/2GMa2/r4I„  +  r  w2  cos2  x 

m  m  2c  mm  m 


where  the  conventionally  calculated  second  order  Legendre  polynomial  is  written 

as29 


P2c  ’  1/2(3  3in2  ~  1)  (321) 

The  conventionally  calculated  mass  distribution  coefficient  I2C  is  similar  to 
equation  (265)  in  that35 

he  •  (Cc  -  Ac)/(Ma»>  <322> 

where  the  conventional  moments  of  inertia  are  given  by3  5 

Cc  ■  +  <323> 

Ac-l(\+z«)dM-L>r»  +  ^)dM  <324) 


Thus  the  conventional  calculations  are  done  using  the  measured  coordinates  of 
geodesy  rm  ,  .  yTn  ,  zm  and  xm  . 

Comparing  equation  (320)  with  equation  (299)  gives 
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5.  MINE  SHAFT,  BOREHOLE,  TOWER  AND  EOTVOS  EXPERIMENTS.  This  section  con¬ 
siders  the  apparent  deviations  from  Newtonian  gravity  that  have  recently  been 
reported  in  the  literature.1-25  These  discrepancies  have  been  found  in  labora¬ 
tory  Eotvos  experiments  where  the  validity  of  Newton's  gravitation  law  is  exam¬ 
ined  over  short  ranges  for  deviations  from  the  inverse  square  law  and  to  detect 
a  possible  dependence  on  the  composition  (baryon  number)  of  the  attracting 
masses.5-15  Deviations  from  the  inverse  square  law  have  also  been  found  in  the 
measurement  of  the  acceleration  of  gravity  over  vertical  distances  of  hundreds 
of  meters  in  mine  shart,  borehole  and  tower  experiments . 1 -s * ~ 4 ’ 2 5  An  analysis 
of  these  apparent  discrepancies  is  given  in  this  section  which  is  based  on  the 
broken  symmetry  of  space  that  is  induced  by  a  pressure  field. ‘s  A  spherical 
earth  assumption  is  made  for  the  calculations  done  here  so  that  the  acceleration 
of  gravity  and  the  effective  radial  gravitational  constant  are  given  by  equations 
(64)  and  (75)  respectively  in  terms  of  the  internal  phase  angle  9r  of  the  radial 
coordinates . 
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A.  Small  Argument  Approximation  to  the  Equilibrium  Equation 
for  the  Internal  Phase  Angles  of  the  Radial  Coordinates. 

For  the  case  where  0p  varies  slowly  with  radial  distance,  the  following 
approximations  can  be  written  for  equation  (57) 


0p  +  P 


30p/3r 

3P/3r 


-  20 


(327) 


The  solution  of  equation  (327)  determines  0r(r)  in  terms  of  P(r)  and  0p(r)  and 
hence  by  equations  (64)  and  (75)  the  acceleration  of  gravity  and  the  effective 
gravitational "constant  are  also  obtained.  As  a  simple  example  consider  the  case 


P  *  P(0)e“ar  (328) 

0p  -  0p(O)e_er  (329) 


where  P(0)  and  0p(O)  =*  pressure  and  its  internal  phase  angle  at  the  center  of 
the  earth.  Combining  equations  (327)  through  (329)  gives 

8  -  -  l/2(o  +  6)/a0p(O)e'Sr  (330) 

=  -  1/2 (a  +  S)/a0p 

For  the  center  of  the  earth  r  *  0  and  equation  (330)  gives 


0r (0)  =  -  l/2(o  + 
At  the  earth's  surface 
0  (R)  =  -  i/2(o  + 
=  -  l/2(o  + 
where  9p(R)  =  internal 
6p(T)  -  9p(0)e'BR 


S)/o0p(O)  (331) 

r  =  R  and 

S)/a0p(O)e"BR  (332) 

3)/a0p(R) 

phase  angle  of  pressure  at  the  earth's  surface  given  by 

(333) 


Note  also  that  the  pressure  at  the  earth's  surface  is  given  by 

— iiR 

P(R)  =  P (0)e 


<  33^1 


Equations  (333)  and  (334)  can  be  used  to  evaluate  a  and  3  . 

For  th&  case  of  a  linear  variation  of  the  pressure  and  its  internal  phase 
angle  of  the  form 

P  -  P(0)  -  ctr  (335) 

ip  -  0p (0)  -  Sr  (336) 

t  follows  from  equation  (527)  that 
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(337) 


er  -  -  l/2[ep(0)  +  8/oP(0)]  +  8r 
The  values  of  a  and  8  can  be  obtained  by  evaluating  equations  (33S)  and  (336) 


at  the  earth's  surface  as  follows 

P(R)  -  P(0)  -  aR  (338) 

6p(R)  -  0p(O)  -  SR  (339) 

At  the  center  of  the  earth  equation  (337)  gives 

9r(0)  -  -  l/2[ep(0)  +  8/oP(0>]  (340) 

while  for  the  surface  of  the  earth 

9r(R)  -  -  l/2[0p(O)  +  8/aP(0)]  +  SR  (341) 


Equations  (330)  and  (337)  show  that  in  general  9  <  0  within  a  gravitating  body. 

B.  Theory  of  the  Apparent  Non-Newtonian  Behaviour  of  Gravity 
in  Mine  Shaft,  Borehole  and  Tower  Experiments. 

Measurements  of  the  variation  of  the  acceleration  of  gravity  up  the  heights 
of  a  tower  and  down  the  depths  of  a  mine  shaft  or  borehole  have  indicated  dis¬ 
crepancies  with  the  inverse  square  law  of  Newtonian  gravity.  A  possible  expla¬ 
nation  of  these  discrepancies  has  been  given  by  assuming  the  validity  of  Newton¬ 
ian  gravitation  in  matter  with  broken  internal  symmetries. 28  The  result  is  that 
the  acceleration  of  gravity  for  a  spherical  earth  is  given  by  equation  (64)  .  In 
order  to  apply  this  equation  to  an  analysis  of  mine  shaft,  borehole  and  tower 
gravity  measurements  it  is  first  necessary  to  calculate  the  internal  phase  angle 
9r  from  equation  (327).  Let  the  coordinates  measured  up  a  tower  from  the  earth  '  s 
surface  be  designated  by  h  ,  so  that  the  distance  from  the  center  of  the  earth 
to  a  point  on  the  tower  is  given  by 

r  -  R  +  h  (342) 


where  R  =  magnitude  of  the  earth's  radius  at  the  base  of  the  tower.  Equation 
(342)  applies  to  a  mine  shaft  or  borehole  if  h  <  0  .  Combining  equations  (327) 
and  (342)  gives 


39  / 3h 

9  +  P  — - - 

P  3P/3h 


29 

r 


(343) 


as  the  equation  for  determining  9r  . 

The  magnitude  of  the  atmospheric  pressure  at  points  on  a  tower,  mine  shaft 
or  borehole  can  be  written  in  its  simplest  form  by  the  following  linear  equation 


P  *  P (R)  -  o (R)g(R)h  -  p(R)g(R)(ha  -  h) 


(344) 


where  the  equivalent  height  of  the  atmosphere  is  given  by 
h3  -  P(R)/[p(R)g(R)] 


(345) 
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and  where  P(R)  ,  p(R)  and  g(R)  ■  magnitudes  of  the  pressure,  air  density  and 
acceleration  of  gravity  respectively  at  the  earth's  surface.  The  measured  values 
of  these  quantities  are  given  by  Pffl(R)  ■  P(R)  cos  6p(R)  ,  pm(R)  ■  p (R)  cos  0p (R) 
and  gm(R)  ■  g(R)  cos  0g(R)  respectively.  The  measured  pressure  is  given  by 
Pm  “  P  cos  0p  .  The  internal  phase  angle  of  the  pressure  will  be  written  in  a 
form  similar  to  equation  (344)  as  follows 

0p  -  9p(R)  -  nh  (346) 

Equations  (344)  and  (346)  are  the  simplest  equations  that  can  be  chosen  to  de¬ 
scribe  the  variation  with  height  (or  depth)  of  the  atmospheric  pressure  and  its 
internal  phase  angle.  Strictly  speaking  P  ,  9p  and  9r  should  be  determined  si¬ 
multaneously  from  equations  (56)  and  (57)  and  the  renormalized  state  equation 
which  is  given  by  a  solution  of  the  complex  number  relativistic  trace  equation 
(2).  Such  a  simultaneous  solution  is  difficult  to  obtain.  Equations  (344)  and 
(346)  represent  a  crude  solution  to  equations  (2),  (56)  and  (57).  These  assumed 
solutions  will  now  be  used  to  obtain  9r  from  equation  (343).  Combining  equations 
(343) ,  (344)  and  (346)  gives 

0  =  -  1/20  (R)  -  I/2n(ha  -  2h)  (347) 

=  -  l/2[ep(R)  +  nha]  +  nh 
At  the  earth's  surface 

9r(R)  -  -  1 / 2 [ 0p (R)  +  nha]  (348) 

Consider  now  the  case  where  the  pressure  and  its  internal  phase  vary  ac¬ 
cording  to  the  following  exponential  forms 

P  =  P(R)e_<^‘  (349) 

e?  =  ep(R)e“<h  (350) 

Combining  equations  (343),  (349)  and  (350)  gives 

8r  -  -  1/2(6  +  <)/69p(R)e~<h  (351) 

and  the  value  at  the  earth's  surface  is 

9  (R)  -  -  1/2(5  +  <)/o0  (R)  (352) 

The  values  of  0r  determine  the  apparent  deviation  of  the  acceleration  of 
gravity  from  Newton's  law  of  gravity  as  is  shown  in  equations  (64)  and  (75). 
Figures  2  and  3  show  sketches  of  the  variation  of  P  and  9p  for  the  solid  earth, 
ocean  and  for  the  atmosphere  in  an  air-filled  mine  shaft  or  borehole  or  adja¬ 
cent  to  a  tower.  The  expected  variation  of  0r  in  the  solid  earth,  ocean  and 
atmosphere  is  shown  in  Figure  4,  while  Figure  5  shows  the  corresponding  varia¬ 
tion  of  Gr  as  given  by  equation  (  75  ).  Figure  5  shows  that  local  measurements 
of  the  acceleration  of  gravity  will  yield  values  of  Gr  which  are  less  than  the 
value  of  G.  The  value  of  G  is  associated  with  the  complete  symmetry  of  time 
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and  space,  and  as  such  it  cannot  be  measured  directly.  The  result  Gr  <  G  is  due 
to  the  effects  of  the  complex  number  atmospheric  pressure  in  the  case  of  mine 
shaft,  borehole  and  tower  experiments,  and  to  the  complex  number  water  pressure 
for  measurements  of  Gr  carried  out  in  the  depths  of  the  ocean.  For  measurements 
of  the  variation  of  Gr  in  a  mine  shaft,  borehole  or  up  a  tower  the  characteristic 
range  for  the  variation  of  Gr  should  be  about  7  km  because  the  atmospheric  pres¬ 
sure  decrease  with  height  has  a  characteristic  attenuation  distance  of  about 
7  km. 

Equation  (  75  )  and  Figure  5  also  show  that  were  it  possible  to  measure  the 
variation  of  the  acceleration  of  gravity  with  depth  in  solid  rock  the  values  for 
Gr  would  be  less  than  those  measured  in  an  air-filled  mine  shaft  or  in  the  ocean. 
This  is  because  P  ,  9p  and  1 9r  j  are  larger  in  rock  than  in  the  ocean  or  in  an 
air-filled  mine  shaft  at  a  corresponding  depth.  Measurements  of  Gr  in  the  ocean 
should  yield  weaker  gravity  (smaller  Gr)  than  corresponding  measurements  in  an 
air-filled  mine  shaft  or  borehole  because  values  of  | 0r |  in  the  ocean  are  larger 
than  their  corresponding  values  in  an  air-filled  mine  shaft  or  borehole  at  the 
same  depth  (see  equation  (  75  )  and  Figure  4). 

C.  Internal  Phase  Theory  of  the  EcStvBs  Experiment  and  its 

Relationship  to  Mine  Shaft,  Borehole  and  Tower  Experiments. 

This  part  of  the  paper  describes  a  theoretical  analysis  of  the  Eotvos  ex¬ 
periment  in  terms  of  Newtonian  gravity  and  the  broken  symmetry  internal  phase 
angles  of  the  relevant  coordinates  of  the  experiment.  The  EBtvBs  experiment  has 
been  thoroughly  described  in  the  literature  and  only  the  briefest  review  is  giv¬ 
en  in  this  paragraph. 6-1 5  This  experiment  measures  the  horizontal  force  of  grav¬ 
ity  between  two  spheres  of  material  chat  are  suspended  in  close  proximity  to 
each  other.  Conventional  Newtonian  theory  predicts  the  measured  gravity  force 
to  be  dependent  on  the  inverse  square  of  the  separation  distance  and  on  the  pro¬ 
duct  of  the  masses  of  the  two  spheres,  but  recent  experiments  suggest  the  pos¬ 
sibility  of  composition  dependent  effects  and  deviations  from  the  inverse  square 
law . 3  - " s 

Consider  now  the  Eotvos  experiment  from  the  perspective  of  the  internal 
phase  theory  of  coordinates.  The  two  spheres  can  be  oriented  in  any  direction 
between  the  north-south  direction  (whose  separation  is  described  by  a  decrement 
of  the  zenith  angle)  or  in  the  east-west  direction  (whose  separation  is  then  de¬ 
scribed  by  a  decrement  of  the  azimuthal  angle) .  For  the  north-south  orientation 
the  complex  number  distance  between  the  two  spheres  is  written  as 

dZ  =  d£  e-^  =  rdi/  =  re'^'Hd^  +  jid9,)  (353) 

'ii  • p  V 

which  gives 

d t.  *  r  sec  8,  ,  dip 

9?  -  9  +  9,  +  8,  , 

<-4  r  'Ii  1 1>  ,'ii 

where  r  *  R  +  h  »  complex  number  distance  of  the  two  spheres  from  the  center  of 
the  earth,  R  »  complex  number  earth's  radius  at  the  position  of  the  two  spheres, 
h  ■  complex  number  distance  above  (or  below)  the  earth's  surface  at  which  the 


(354) 

(355) 
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EiJtvds  experiment  is  conducted,  dp  *  complex  number  zenith  angle  separation  of 
the  two  spheres  situated  in  the  north-south  direction  (longitudinal  plane)  and 
is  given  by  equation  (140).  The  measured  distance  between  the  two  spheres 
situated  in  the  north-south  orientation  is  given  by 


cos  % 


(356) 


For  the  east-west  orientation  the  complex  number  distance  between  the  two  spheres 
is 


d 1  *  d£  e-*9-^  ■  ?  sin  p  dp  (357) 

9  9 

where  dp  *  complex  number  azimuthal  angle  separation  of  the  two  spheres  that  are 
situated  in  a  plane  of  latitude  X  =  ir/2  -  p  ,  and  therefore 


d Z,  ■  rS,  sec  S,  .  dp 

9  P  9,9 


U 


0  +  9  ,  +  +  S 

r  sip  <p  p,p 


(358) 

(359) 


where  Sy  and  0S^  are  given  by  equations  (143)  and  (144)  respectively,  and  where 
3<p,;p  is  given  by  equation  (141).  The  measured  distance  between  the  two  spheres 
situated  in  the  east-west  direction  is  given  by 


dlpm  "  ^p  COS  %  (360) 

The  gravitational  force  between  the  two  spheres  situated  in  the  north-south 
direction  is 

F  -  -  Gm2/(dl  )2  =  -  Gm2/(d£,)2e_2:ie^  (361) 

P  P  P 

where  m  =  mass  of  one  sphere.  The  measured  gravitational  force  between  the  two 
spheres  in  the  north-south  direction  is 

F,  -  -  Gm2/ (d£  ) 2  cos ( 20  n  )  (3 uZ) 

vm  <p  'C.ip 

=  -  Gm^Cdl^)2  cos (29^)  cos2  9^ 

The  conventional  calculation  of  the  Newtonian  gravitational  force  between  the 
two  EdtvOs  spheres  is  given  by 

F,  =  -  Gm2/ (dt  )2  (363) 

•pc  ipm 

Therefore  the  difference  between  the  measured  and  conventionally  predicted  forces 
in  the  north-South  direction  is 


ip  pm  pc 

■  ™2'WV2[1  ' cos2  %  C°S(2V] 

-  3»£„G"2',<dV2 


(364) 
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In  a  similar  fashion  the  complex  number  gravitational  force  and  the  mea¬ 
sured  force  between  the  two  spheres  situated  in  the  east-west  orientation  are 
given  respectively  by 

-  -  Gm2/(dZ^)2  -  -  Gm2/(d^)2e“2j0^  (365) 

V2  coa<2lV  <366) 

=■  -  Gm2/(d£^m)2  cos  (20^)  cos2  9^ 


The  conventionally  calculated  gravitational  force  is  given  by 

V  -  -  <367) 

and  the  difference  between  equations  (366)  and  (367)  is 


=  ~  cosZ  %  cos(29^)] 

'  3eiG“2/<dv2 


(368) 


A  measurement  of  the  discrepancies  between  the  measured  and  predicted  values  of 
the  gravity  force  for  the  north-south  and  east-west  orientations  of  the  Eotvds 
experiment  will  give  values  of  9^  and  9 ^  . 


From  equations  (75),  (362)  and  (366)  it  follows  that  there  are  three  effec¬ 
tive  gravitational  constants  each  associated  with  a  direction  (r  ,  ip  or  j>)  of 
measurement  of  the  gravitational  force,  so  that 

G  =  G  cos (29  )  cos2  9  G( 1  -  302  +  •••)  (369) 

r  r  r  r 

G  =*  G  cos (20£^)  cos2  ■v  G(1  -  30^  +  •••)  (370) 

G^  *  G  cos(29^)  cos2  9^  'v  G ( 1  -  39^  +  •••)  (371) 

Due  to  the  internal  phase  structure  of  the  coordinates  the  effective  Newtonian 
gravitational  constant  has  three  distinct  values  along  the  three  orthogonal 
directions  at  a  point  on  the  earth's  surface.  Because  | 9^ |  and  |  9.J  are  expect¬ 
ed  to  be  smaller  than  |9r|  it  follows  from  equations  (355)  and  (359)  that  for 
the  same  height  (or  depth) 


'%!  <  l%!  <  l0ri  ' 


G  >  G  >  G 
4>  <P  r 


S  9  <  0 

r  r 


0X  >  0 
$ 


0,  >  0 


(372) 
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> 


(373) 


%l  >  !%l  >  '  9r ' 


G  <  G.  <  G 
t  <P  r 


0  <0,0.  <0,9,<0 

r  *  <fr  ip 


where  for  both  cases  0£<j)  <  0  and  0£^  <  0  .  According  to  this  theory,  depending 
on  the  signs  of  0^  and  0^  the  values  of  and  G<{>  measured  by  the  E5tv5s  experi¬ 
ment  can  be  greater  or  less  than  the  value  of  Gr  determined  by  gravimeter  mea¬ 
surements  in  a  mine  shaft,  borehole  or  tower  experiment.  References  1  through 
3  suggest  that  equation  (373)  are  the  correct  conditions  while  references  5  and 
25  suggest  that  equation  (372)  gives  the  correct  relationship  between  G^  ,  G^ 
and  Gr  .  The  experimental  results  are  not  yet  clear  enough  to  decide  between 
0^  >  0  and  0,j,  >  0  or  <  0  and  0^  <  0  . 


On  account  of  equations  (369)  through  (371)  it  follows  that  G^  and  G<p  mea¬ 
sured  by  an  Ebtvds  experiment  should  have  a  similar  variation  with  depth  (or 
height)  as  does  Gr  .  This  is  shown  in  Figure  5.  The  validity  of  equations  (370) 
and  (371)  can  possibly  be  tested  by  conducting  Ebtvos  experiments  in  the  depths 
of  the  ocean,  down  mine  shafts,  or  up  a  tower  in  order  to  see  if  and  G^  vary 
in  the  same  sense  as  Gr  .  For  a  tower  or  air-filled  mine  shaft  Ebtvos  experi¬ 
ment  the  characteristic  length  over  which  G^  and  G^  change  should  be  about  7  km 
because  this  is  the  characteristic  variation  distance  of  the  atmospheric  pres¬ 
sure  in  the  vertical  direction. 1+3  The  characteristic  distance  for  the  decrease 
of  Gw,  and  G<j,  with  depth  in  the  ocean  (or  solid  earth  if  such  experiments  were 
possible)  should  be  much  larger  than  7  km  because  the  pressure  changes  in  these 
cases  are  over  hundreds  and  thousands  of  kilometers.35-  0,1+3-45  Another  possible 
test  would  be  to  perform  the  EStvds  experiment  in  a  pressure  chamber  and  measure 
the  pressure  dependence  of  G^,  and  G<j,  in  order  to  verify  that  G^j  and  G<j,  are  de¬ 
creasing  functions  of  the  ambient  pressure  as  suggested  by  equations  (370)  and 
(371) -  In  any  case,  equations  (369)  through  (371)  show  that  the  local  measure¬ 
ments  of  gravity  do  not  directly  determine  the  Newtonian  gravitational  constant 
G  .  Approximate  values  of  G  can  be  determined  directly  from  satellite  or  solar 
system  measurements  where  the  effects  of  ambient  pressure  are  negligible,  but 
in  ti;?  s  case  the  values  9^  ,  SX  and  9^  of  the  broken  symmetry  vacuum  must  be 
taken  into  consideration.  Thus  even  for  measurements  in  the  vacuum  G  cannot  be 
directly  measured. 


Consider  the  variation  of  the  gravitational  constant  Gr  given  by  equation 


(369)  from  which  it  follows  that 

G  (R-h)  -  G  (R)  ^  3G[0*(R)  -  e^(R-h)]  (374) 

G  (R-tfi)  -  G  (R)  ^  3G[0^(R)  -  9~(R+h)]  (375) 

A  Taylor  series  expansion  of  9r  gives 

0  (R-h)  *  0  (R)  -  h30  /3h  +  •••  (376) 

r  r  r 

9r(R+h)  =  9r(R)  +  h30r/3h  +  •••  (377) 


Combining  equations  (374)  through  (377)  gives  for  small  h 
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(378) 


[Gr(R-h)  -  Gr(R)]/G  -v  -  srh 
[Gr(R+h)  -  Gr(R)]/G  ~srb  (379) 

where 

sr  -  6j  9r  (R)  1  30r/3ir{  R  >  0  (380) 

remembering  that  6r  <  0  .  Combining  equations  (378)  and  (379)  gives 

Gr(R-h)  <  Gr(R)  <  Gr(R+h)  (381) 

as  shown  in  Figure  5.  Note  that  Gr(R)  is  not  the  value  of  the  gravitational 
constant  that  is  measured  at  the  earth's  surface  by  the  Eotvos  experiment.  The 
value  of  the  gravitational  constant  measured  by  the  Eotvds  experiment  is  given 
by  Gy  (R)  or  G<j,(R)  . 

The  Eotvds  experiment  can  be  done  at  various  depths  and  heights.  From 


equations  (370)  and  (371)  it  follows  that 

G^(R±h)  -  G  (R)  -v  3G[9^(R)  -  0^(R±h)]  (382) 

G  (R±h)  -  G  (R)  *  3G[9^(R)  -  9^(R±h)]  (383) 

A  Taylor  series  is  used  to  obtain 

%(R±h)  -  0^(R)  ±  h39^/3r|R  +  •••  (384) 

9^(R+h)  ■  ©^(R)  £  h30£,j/3rlR  +  •**  (385) 

Combining  equations  (382)  through  (385)  gives  for  small  h 

[Gi(  (R±h)  -  G;Jj(R)]/G  ^  ±  s,.h  ( 386) 

[G  (R±h)  -  G  (R) ]/G  v  ±  s  h  (387) 

<P  <> 

where 

3%/3rlR  >  0  <388> 

'  MV*) I  3V3r!R  ”  0  (339) 

because  9^  <  0  and  9^  <  0  .  Therefore 

G.(R-h)  <  G. (R)  <  G  (R+h)  (390) 

\p  ip  ip 

GA(R-h)  <  G.(R)  <  GA(R+h)  (391) 

/p  ip  <j> 

Thus  G^  and  G^  are  increasing  functions  of  height. 


The  Edtvds  experiments  done  at  the  same  height  but  in  the  east-west  and 
north-south  directions  give  the  following  difference  obtained  from  equations 
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.  (370)  and  (371) 


%  -  V/G '  3(9i  -  9i> 


(392) 


3[(eT.  +  e  .  +  a,  +  6.  ,)2  -  (6  +  ©,  +  s,  ,)2] 

r  sip  ip  p,p '  r  ip  ip.ip 


Therefore 

G .  >  G 

p 

G  <  G 


<P 


P 


^  60  (0  .  +  0A  +  8X  x  -  0,  -  6,  , ) 

r  sip  ip  4>,<p  ip  ip, ip7 

^  60  0A 
r  <p 


0,  >  0 

<P 

0,  <  0 


(393) 

v394) 


The  measurement  of  the  east-west/north-south  asymmetry  will  give  the  sign  (and 
approximate  magnitude  if  0r  is  known)  of  the  internal  phase  angle  of  the  angular 
coordinates. 

In  general  the  Eotvos  experiment  is  done  at  the  earth's  surface  and  deter¬ 
mines  G^(R)  and  G<j,(R),  while  the  vertical  gravity  measurements  using  gravimeters 
are  done  down  mine  shafts  and  boreholes  or  up  towers  and  determine  Gr(R±h)  .  Con¬ 
sider  a  comparison  of  G_(R±h)  and  G^(R)  which  can  be  obtained  using  equations 
(355),  (369)  and  (370) 


[Gr(R±h)  -  G^(R)]/G  ^  3[02  (R)  -  02(R±h)] 


(395) 


-  3{[0r(R)  +  0^(R)  +  e,^(R)]  -  92(R±h)} 

Combining  equations  (376) ,  (377)  and  (395)  gives 

[G  (Rih)  -  G  (R)  ]/G  ■v  ±  s  h  +  60  (R)[9  (R)  +  6  .  (R)  ]  +  3[  0  (R)  +  6,.  (R)]2 

L  •J'  L  L  y  v  »  y  V  y  J 

■v  ±  s  h  +  60  (R)[0  .  (R)  +  S  (R)  ]  (396) 

r  r  ip  ip ,  ip 

From  equation  (396)  it  follows  that 

[GC(R)  -  Gt(R)]/C  -v  68r(R)[6t(R)  +  6  ^<R>]  +  3[0J)(R)  +  B^CR)]2  (397) 

'  6VR)CVR)  +  V*(R)1 


It  follows  from  equations  (381) ,  (396)  and  (397)  that 

Gr(R-h)  <  Gr(R)  <  Gr(R+h)  <  G  (R)  <  G  "1  9^(R)  >  0 

G  (R)  <  Gr(R-h)  <  Gr(R)  <  Gr(R+h)  <  G  J  0  (R)  <  0 

The  inequalities  in  equations  (398)  and  (399)  hold  only  for  small  h. 


(398) 

(399) 
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Now  consider  the  case  of  the  east-west  oriented  E8tv8s  experiment.  Com¬ 
bining  equations  (359),  (369)  and  (371)  gives 

[Gr(R±h)  -  G  (R)]/G  -v  3[9^(R)  -  e^(Rih)]  (400) 

-  3([8r(R)  +  8^<R)  +  «^(R)  +  s^(R)]2  -  8j(R±h)} 

Combining  equations  (376),  (377)  and  (400)  gives 

[Gr(R±h)  -  G^(R)]/G  ^  ±  srh  +  60r(R)[e^(R)  +  ©^(R)  +  B^(R)]  (401) 

+  3[esif(R)  +  St(R)  +  B^tR)]2 
■v.  ±  ,rh  +  6er(R)[es<i(R)  +  8+«)  +  B^CR)] 

From  equation  (401)  it  follows  that 

[Gr(R)  -  G^(R)]/G  *  60r(R)[0g^(R)  +  0^(R)  +  8^  ^ (R) ]  (402) 

+  3Cest«) +  VR) + 

'  +  vR)  +  %,»<«] 

From  equations  (381),  (401)  and  (402)  it  follows  that 

G  (R-h)  <  G  (R)  <  G  (R+h)  <  G.(R)  <  G  8.(R)  >  0  ,  0,(R)  >  0  (403) 

r  r  r  <}>  (  9  \p 

G  (R)  <  G  (R-h)  <  G  (R)  <  G  (R+h)  <  G  J  0.(R)  <  0  ,  0 ,  (R)  <  0  (404) 

-P  r  r  r  <p  p 

The  inequalities  in  equations  (403)  and  (404)  hold  only  for  small  h. 

Inequalities  (398)  and  (403)  are  supported  by  the  experimental  data  in  Ref¬ 
erences  5,  25  and  49  and  suggest  that  mine  shaft,  borehole  and  tower  determina¬ 
tions  of  the  gravitational  constant  will  be  less  than  the  gravitational  constant 
determined  by  an  E8tv8s  experiment  performed  at  the  earth's  surface.  On  the 
other  hand,  the  inequalities  (399)  and  (404)  are  supported  by  the  experimental 
data  given  in  References  1  through  3  and  indicate  that  the  gravitational  constant 
determined  from  a  mine  shaft,  borehole  or  -tower  experiment  will  be  larger  than 
the  value  of  the  gravitational  constant  obtained  by  an  Eotvos  experiment  con¬ 
ducted  at  the  earth's  surface.  Only  one  set  of  data  can  be  correct.  When  the 
correct  set  of  experimental  data  is  finally  determined  the  proper  signs  and  ap¬ 
proximate  magnitudes  of  0^  and  0^  will  be  fixed.  Neither  mine  shaft,  borehole, 
tower  or  E<Rtv8s  experiments  directly  measure  the  constant  G  . 

If  it  is  possible  to  conduct  an  E8tv8s  experiment  at  various  depths  in  a 
mine  shaft  or  at  different  heights  up  a  tower,  it  becomes  important  to  compare 
Gr  with  Gg,  and  G<j,  at  the  same  depth  or  height.  From  equations  (369)  through 
(371)  it  follows  that 
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<Cr  -  <y/G  -  3(9^  -  0*, 


(405) 


-  3[(0  +  0.  +  8,  ,)2  -  02] 

u  r  ip*'/'  r 

-  3[20  (0,  +0,  ,)  +  (0,  +  6,  ,)2] 

L  r  ip  ip.ip  ip  iji,i|r 

•v*  60  (0,  +  8,  ,) 

r  ip  ip  ,ip 


*  120  0, 

r  ip 

In  a  similar  fashion 

«=r  -  V/G  '  3<6i*  -  9r> 


(406) 


3C<9r  +  %  +  %  +  6*,/  -  9r^ 

3[20r(9  +  9  +  »  )  +  (e  +0+0 

Sip  <p  (j 


,)2] 


^  60  (0  ,  +  0,  +  8, 
r  sip  <p  <J 

-v*  180  0A 

r  <p 


The  inequalities  (372)  and  (373)  can  also  be  deduced  from  equations  (405)  and 
(406). 


The  variation  of  Gr  ,  G<^  and  Gy  with  depth  in  a  mine  shaft  and  borehole  or 
with  height  up  a  tower  is  due  to  Newtonian  gravity  in  broken  symmetry  space  com¬ 
bined  with  the  variation  of  the  broken  symmetry  atmospheric  pressure.  The  vari¬ 
ation  of  the  broken  symmetry  atmospheric  pressure  with  radial  distance  induces  a 
variation  of  0r  ,  0^  and  0^  with  radial  distance  (Figure  4)  and  this  determines 
the  non-Newtonian  variation  of  Gr  ,  G<j>  and  G^  according  to  equations  (369) 
through  (371).  This  apparent  non-Newtonian  behaviour  of  gravity  has  been  inter¬ 
preted  as  being  due  to  the  existence  of  graviscalar  (spin  0)  and  graviphoton 
(spin  1)  component  forces  of  gravity  (the  "fifth"  and  "sixth"  f orces) . 1-4 ’ 1 3-2 5 
The  hypothetical  graviscalar  is  an  attractive  force  while  the  hypothetical  gravi¬ 
photon  mediates  a  repulsive  force,  and  both  are  described  by  finite  range  Yukawa 
terms  that  are  added  to  the  ordinary  Newtonian  potential.  But  in  fact  these  hy¬ 
pothetical  forces  are  not  required  to  describe  the  experimental  results.  The 
apparent  non-Newtonian  behaviour  is  due  to  ordinary  Newtonian  gravity  in  matter 
and  space  whose  pressure  and  coordinate  fields  exhibit  broken  internal  symme¬ 
tries.  The  relative  magnitudes  of  Gr(R±h)  and  G^(R)  or  G^,(R)  are  not  related 
to  new  gravitation  forces  but  rather  to  the  broken  symmetry  of  pressure  and 
spatial  coordinates. 

6.  NUMERICAL  VALUES  OF  THE  INTERNAL  PHASE  ANGLES.  This  section  determines 
numerical  values  of  0r  ,  0$  ,  0^  and  0p  within  the  atmosphere  in  the  vicinity  of 
the  earth's  surface.  Two  methods  are  used.  The  discrepancy  between  the  measured 
and  predicted  values  of  the  gravitational  red  shift  of  y-rays  in  the  Pound- 
Rebka-Snider'’ experiment . 3  2  ’ 3T  ’ 1*6-1*8  The  second  method  is  based  on  the  measure¬ 
ment  of  the  apparent  departure  of  the  force  of  gravity  from  Newtonian  behaviour 
in  mine  shaft,  borehole  and  tower  experiments . 1-3 » 5 > 2 5 » 4 9 
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A.  Measurement  of  the  Gravitational  Red  Shift 

The  experiments  of  Found,  Rebka  and  Snider  measured  the  gravitational  red 
shift  of  a  y-ray  falling  in  the  earth's  gravitational  field.  The  conventional 
expression  for  the  red  shift  in  frequency  is  given  by32»33 

zc  -  (Av/v)c  -  [V(r2m)  -  V(rlm)]/c2  (407) 

where  the  conventionally  calculated  gravitational  potential  is  written  as28 

V(r  )  -  -  GM/r  (408) 

m  m 

and  therefore 

*c  ■  <*/c2(l/rlm  -  l/r2a)  (409) 

In  this  paper  the  theory  of  coordinates  with  internal  phase  requires  a  com¬ 
plex  number  red  shift  given  by 

z  =■  ze^0z  »  Av/v  ■  [V (r 2)  ~  V(r^)]/c2  (410) 

■  GM/c2(l/r1  -  l/r2)  . 

The  measured  gravitational  red  shift  is  given  by  the  real  part  of  equation  (410) 

2 

z  =  z_=»  z  cos  0  -  GM/c  (1/r,  cos  9  ,  -  l/r0  cos  9  0)  (411) 

m  R  z  L  rl  2  rz 

while  the  imaginary  part  of  equation  (430)  is 

zT  =  z  sin  9  ■  -  GM/c2(l/r,  sin  9  ,  -  1/r.  sin  9  ..)  (412) 

i  z  1  ri  l  rz 

where 


=  r^0*1  (413) 

r,,  =»  ^e^0^  (414) 


Because  r^m  *  r^  cos  9ri  and  r2m 

z  *  GM/c2(l/r.  cos2  9  ,  - 
m  lm  r  1 


■  r 2  cos  0r2  it  follows  from  equation  (411)  that 
l/r2  cos2  9  2)  (415) 


The  difference  between  the  measured  and  entionally  calculated  gravitational 

red  shift  is  obtained  from  equations  (40V,  and  (415)  to  be 


z 

c 


zm  '  GM/c2(1/clm  311,2  9rl  '  l/r2a  311,2  0r2> 

'  GM/<;2<e2l/rl„  *  S?2/r2m) 

-V  92Z 

r  c 


(416) 


Therefore  approximately 
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(417) 


2 

0  -v  (z  -  z  )/z  ^  0.01 

r  c  m  e 

where  according  to  References  47  and  48  the  fractional  difference  between  the 
calculated  and  measured  gravitational  red  shift  is  1%.  From  equation  (417)  it 
follows  that 

e  *  -  0.1  rad  -  -  5.7°  (418) 

In  this  way  a  value  of  dr  is  obtained  from  the  measurement  of  the  gravitational 
red  shift.  This  laboratory  value  of  9r  is  probably  more  accurate  than  the  cor¬ 
responding  values  of  9r  that  may  possibly  be  obtained  from  measurements  of  the 
apparent  non-Newtonian  variation  of  gravity  in  mine  shafts,  boreholes  and  tow¬ 
ers.  The  value  of  9r  given  in  equation  (418)  will  be  used  to  obtain  values  9$ 
and  9^  in  section  B. 

B.  Analysis  of  the  Apparent  Non-Newtonian  Gravity  Measurements. 

Conflicting  experimental  data  have  been  presented  for  the  values  of  the 
gravitational  constant  derived  from  measurements  of  the  variation  of  the  force 
of  gravity  with  distance  in  mine  shafts,  boreholes  and  towers.  According  to 
References  1  through  3,  the  values  of  the  gravitational  constant  derived  from 
mine  shaft  gravity  variations  are  larger  than  those  derived  from  Ebtvtfs  experi¬ 
ments  conducted  at  the  earth's  surface.  On  the  other  hand,  References  25  and  49 
indicates  that  borehole  measurements  in  the  ice  of  a  glacier  produce  values  of 
the  gravitational  constant  that  are  smaller  than  the  values  of  the  gravitational 
constant  derived  from  Eotvos  experiments  performed  at  the  surface  of  the  earth. 
In  addition.  Reference  5  indicates  that  the  gravitational  constant  derived  from 
gravity  measurements  on  a  tower  is  smaller  than  that  measured  by  the  Ebtvds  ex¬ 
periments  at  the  earth's  surface.  The  experimental  results  given  in  References 
1  through  3  are  in  conflict  with  the  experimental  results  of  References  5,  25 
and  49.  Therefore  the  numerical  calculations  in  this  section  are  done  for  each 
situation.  According  to  the  theory  of  Newtonian  gravity  in  matter  and  vacuum 
with  broken  internal  symmetries  the  discrepancies  between  the  measured  and  con¬ 
ventionally  predicted  values  of  the  force  of  gravity  in  mine  shaft,  borehole, 
tower  and  Eotvos  experiments  are  related  to  the  values  of  6r  ,  d,p  and  0$  . 

Two  cases  will  be  examined  in  this  section  according  to  the  relative  mag¬ 
nitudes  of  the  internal  phase  angles  j9r|  »  | 9^ j  and  [9^1  . 

Case  1:  j  9r (R> [  »  | 9  (R) |  and  i ©r (R) j  >>  | 9^ (R) j 

Combining  equations  (392),  (396)  and  (40 1 )  gives  for  small  h/R 

[G  (R)  -  G  (R)]/G  ^  6xy 

[Gr(R±h)  -  G  (R)]/G  *  12xy 

[G  (R±h)  -  G. (R)]/G  ^  18xy 

r  <}) 


(419) 

(420) 

(421) 


where 


(422) 


x  -  8  (R)  <  0 
r 

y  *  VR)  %  '  %(R>  *  VR)  '  <*23> 

so  that  | x |  >>  |y|.  Because  x  <  0  it  is  the  sign  of  y  that  determines  the  signs 
of  the  expressions  in  equations  (419)  through  (421).  The  expressions  in  equa¬ 
tions  (420)  and  (421)  can  be  either  positive  or  negative,  so  that  they  can  de¬ 
scribe  either  the  experimental  results  of  References  1  through  3  (which  requires 
y  <  0)  or  the  experimental  results  of  References  5,  25  and  49  (which  requires 
y  >  0) .  Because  the  value  of  x  *  9r(R)  is  known  from  equation  (418)  the  deter¬ 
mination  of  any  one  of  the  three  differences  in  equations  (419)  through  (421) 
would  immediately  determine  the  value  for  y. 


Consider  first  the  determination  of  the  gravitational  constant  from  mine 
shafts  which  is  about  0.6%  larger  than  the  value  obtained  from  Eotvbs  experi¬ 
ments  performed  at  the  earth’s  surface.1-3  Using  the  average  of  equations  (42 0) 
and  (421)  (because  of  the  unspecified  orientation  of  the  Eotvos  experiments) 
yields 

15xy  -  15 (-0 . l)y  *  0.006  (424) 

where  x  *  -0.1  was  obtained  from  equation  (418).  Equation  (424)  gives 


x  =  0r(R)  *  -  0.1  rad  *  -  5.7° 
y  *  0  (R)  -  0  (R)  -  -  0.004  rad  *  -  0.23° 


(425) 


From  equation  (419)  the  north-south/east-west  asymmetry  of  the  Eotvos  experiment 
is  given  by 

[G  (R)  -  G.(R)]/G  ^  +  0.0024  (426') 

■p  0 


Borehole  data  from  a. Greenland  glacier  shows  that  the  derived  gravitation¬ 
al  constant  is  about  2.8%  smaller  than  the  value  of  the  gravitational  constant 
derived  from  Eotvos  experiments  done  at  the  earth's  surface.25’^3  Also,  mea¬ 
surements  of  the  variation  of  the  gravity  force  up  a  tower  gives  results  for  the 
gravitational  constant  that  are  2.0%  smaller  than  that  obtained  from  Eotvos  ex¬ 
periments  at  the  surface  of  the  earth.5  Therefore  using  the  average  of  the  re¬ 
sults  of  References  5  and  49  with  the  average  of  equations  (420)  and  (421)  gives 

L5xy  =  15(-0.1)y  =  -  0.024  (.427) 


where  again  x  *  -0.1  was  obtained  from  equation  (418).  Then  equation  i.-*27)  yields 


*  »  9f(R) 

y  -  0#(R) 

Equation  (419) 
ment  to  be 


*-0.1  rad  =  -  5.7° 

(428) 

-  0a(R)  *  +  0.016  rad  *  +  0.92° 

gives  the  north-south/east-west  asymmetry  of  the  Eotvbs  experi- 


[G, (R)  -  G  (R) ]/G  v  -  0.0096 


(429) 


An  independent  determination  of  the  north-south/east-west  asymmetry  of  the  Edtvds 
experiment  would  immediately  determine  the  signs  and  values  of  9^,  (and  9^) . 

Case  2:  «,«)  -  0/R)  -  -  8^(W  -  B^CR)  -  6^<R) 


Combining  equations  (392) ,  (395)  and  (400)  gives  for  small  h/R 

[G  (R)  -  G+(R)]/G  *  3[(4w)2  -  (3w)2]  -  21w2  (430) 

[Gr(R±h)  -  G  (R)]/G  *  3[(3w)2  -  w2]  -  24w2  (431) 

[Gr(R±h)  -  G^(R)]/G  ~  3[(4w)2  -  w2]  =■  45w2  (432) 

where  w  -  9  (R)  -  9  (R)  -  9  (R)  -  6  (R)  -  6.  .  (R)  -  B..(R)  <  0  (433) 

r  V  <p  si/<  i hw  d> » 4> 


Thus  w  <  0  because  9r(R)  <  0  .  The  expressions  in  equations  (431)  and  (432) 
are  always  positive  and  therefore  Case  2  agrees  only  with  the  experimental  data 
given  in  References  1  through  3.  Using  the  average  of  equations  (431)  and  (432) 
(because  of  the  unspecified  orientation  of  the  EotvSs  experiments)  and  the  0.006 
positive  fractional  difference  between  the  values  of  the  gravitational  constant 
measured  in  a  mine  shaft  and  by  an  EHtvBs  experiment  performed  at  the  surface  of 
the  earth  given  by  References  1  through  3  yields 


34. 5w2  =  0.006 

w  =*  9^ (R)  *  9^(R)  *  9^(R)  *  -  0.76° 


(434) 


where  w  is  given  by  equation  (433).  Equation  (430)  gives  the  north-south/east- 
west  asymmetry  of  the  EStvfls  experiment  as 


[G,(R)  -  G.(R)]/G  ^  0.0037  (435) 

The  predicted  value  0r(R)  ®  -0.76°  is  much  less  in  magnitude  than  the  value 
9r(R)  =  -5.7°  predicted  by  the  Pound-Rebka-Snider  experiment.  Therefore  Case  2 
as  represented  in  the  assumption  given  in  equation  (433)  may  not  be  physically 
realistic . 


7.  CONCLUSION.  Newtonian  gravity  in  space  and  time  with  broken  internal 
symmetries  produces  an  apparent  non-Newtonian  behaviour  of  the  acceleration  of 
gravity,  and  the  gravitational  constant  varies  with  the  radial  distance  from  the 
center  'of  a  planet.  This  is  due  to  the  fact  that  the  pressure  and  coordinates 
in  matter  (and  vacuum)  exhibit  broken  symmetries  that  are  represented  by  inter¬ 
nal  phase  angles  which  vary  with  radial  distance.  The  measured  apparent  non- 
Newtonian  gravity  effects  are  therefore  due  to  the  variation  of  the  atmospheric 
pressure  in  mine  shafts  and  boreholes  and  on  towers,  and  this  introduces  an  ap¬ 
parent  7  km  finite  range  force  component.  New  forces  in  addition  to  Newtonian 
gravitation  are  not  required  to  explain  the  experimental  observations.  The 
values  of  the  internal  phase  angles  of  the  coordinates  can  be  obtained  from  the 
Pound-Rebka-Snider  gravitational  red  shift  experiment,  the  measurements  of  the 
apparent  non-Newtonian  gravity  field,  and  the  EBtvtSs  experiments.  The  internal 
phase  angles  of  space  and  time  will  influence  the  basic  calculations  of  astro¬ 
physics  and  geophysics. 
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Figure  1 .  Sketch  showing  the  dependence  of  9+ ,  9^  and  on  zenith  angle  and  the  dependence 
of  9X ,  9$x  and  Sx  on  the  latitude  x  •  The  experimental  situation  is  not  yet  dear  and  it  may  be  that 
9j.  <  0  and  >  0. 
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Figure  2.  Sketch  showing  variation  of 
pressure  with  radiaJ  distance  for  the 
atmosphere,  ocean  and  solid  earth  (not 
to  scale). 


Figure  3.  Sketch  showing  variation  of 
the  internal  phase  angle  of  the  pressure 
with  radial  distance  for  the  atmosphere, 
ocean  and  solid  earth  (not  to  scale). 
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Figure  4.  Sketch  showing  variation  of 
the  internal  phase  angle  of  the  radial 
coordinate  with  radial  distance  for  the 
atmosphere,  ocean  and  solid  earth  (not 
to  scale). 
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Figure  5.  Sketch  showing  the  variation  of 
Gr,  and  G#  with  radial  distance.  Two 
possible  cases  for  Gy,  and  G#  are  shown 
(not  to  scale). 
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WAVE  PROPAGATION  IN  ASYMMETRIC  MEDIA 


Richard  A.  Weiss 

U.  S.  Army  Engineer  Waterways  Experiment  Station 
Vicksburg,  Mississippi  39180 


ABSTRACT .  The  coordinates  of  space  and  time  have  broken  internal  symme¬ 
tries  for  a  region  of  spacetime  located  in  a  pressure  field  and  perhaps  even 
for  the  vacuum.  Geometrical  angles  themselves  have  internal  phase  angles.  A 
wave  propagating  in  matter  or  the  vacuum  with  broken  internal  symmetries  will 
exhibit  internal  phase  angles  in  its  amplitude  and  dispersion  characteristics. 
Cylindrical  and  spherical  wave  propagation  in  asymmetric  matter  is  treated 
and  the  solution  of  the  wave  equation  with  broken  internal  symmetries  is  ob¬ 
tained.  The  observed  periodicities  of  waves  in  measured  time  and  measured 
space  requires  the  propagation  constants  to  be  complex  numbers  but  the  phase 
must  be  a  real  number.  A  pressure  field  is  associated  with  a  broken  internal 
symmetry,  and  therefore  waves  propagating  in  matter  under  pressure  are  expected 
to  exhibit  broken  symmetry  effects  in  the  propagation  parameters.  Applications 
to  acoustic  and  seismic  waves  are  suggested. 

1 .  INTRODUCTION .  Matter  and  radiation  exists  within  the  continuum  of 
spacetime,  and  it  has  been  suggested  that  spacetime  imprints  measurable  effects 
on  the  properties  of  bulk  matter  and  radiation.  These  effects  have  been  calcu¬ 
lated  by  the  development  of  a  gauge  theory  of  relativistic  thermodynamics.1  The 
effects  of  spacetime  structure  on  matter  and  radiation  occur  in  two  ways,  the 
first  is  by  the  effects  of  the  Grilneisen  parameter  and  bulk  modulus  which  enter 
the  relativistic  trace  equation  as  a  requirement  of  gauge  invariance.1  The  sec¬ 
ond  way  the  metric  of  spacetime  affects  the  state  equation  of  matter  and  radia¬ 
tion  is  by  requiring  the  thermodynamic  functions  such  as  pressure  and  internal 
energy  to  exhibit  broken  internal  symmetries.2  At  the  same  time  the  coordi¬ 
nates  of  points  located  within  matter,  radiation  or  the  vacuum  also  have  broken 
internal  symmetries  and  the  internal  phase  angles  of  the  coordinates  must  be 
determined  simultaneously  with  the  internal  phase  angles  of  the  thermodynamic 
functions . 3 

All  physical  phenomena  occuring  within  matter,  radiation  or  the  vacuum  are 
affected  by  the  broken  symmetries  of  space  and  time.  Electromagnetic  and  me¬ 
chanical  waves  are  expected  to  exhibit  the  effects  of  broken  spacetime  symme¬ 
try  in  both  the  wave  amplitude  and  dispersion  equation.  The  wave  amplitude, 
wavelength  and  frequency  are  characterized  by  internal  phase  angles  and  there¬ 
fore  must  be  represented  as  complex  numbers.  The  speed  of  sound  and  electro¬ 
magnetic  waves  in  matter  must  be  represented  as  complex  numbers,  but  the  light 
speed  in  vacuum  is  a  real  number. 

Ihe  broken  symmetry  of  the  pressure  in  matter  or  vacuum  is  derived  from 
a  relativistic  trace  equation.2  In  bulk  matter  or  vacuum  the  space  and  cime 
coordinates  are  complex  numbers  and  are  written  as  follows3 
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This  paper  considers  the  solution  of  a  complex  number  wave  equation  whose 
space  and  time  coordinates  have  broken  internal  symmetries.  Section  2  consid¬ 
ers  the  time  dependence  of  periodic  waves  in  broken  symmetry  matter  where  the 
periodicity  occurs  in  the  real  part  (the  measured  part)  of  the  complex  number 
time  coordinate.  Section  3  considers  cylindrical  waves  with  broken  internal 
symmetry  and  it  is  shown  that  Che  azimuthal  angle  equation  has  a  complex  number 
separation  constant.  The  real  part  of  the  complex  number  azimuthal  angle  has 
the  0  -*■  2ti  symmetry.  The  remaining  coordinate  equations  also  have  complex  num¬ 
ber  separation  constants  which  are  determined  by  the  requirement  that  the  wave 
periodicity  occurs  in  the  real  parts  of  the  complex  number  radial  and  z  coordi¬ 
nates.  Section  4  considers  spherical  wave  propagation  in  asymmetric  matter, 
and  develops  the  equations  describing  the  conditions  of  periodicity  in  the  real 
parts  of  the  radial,  azimuthal  and  zenith  angle  coordinates. 


2.  PERIODIC  VIBRATIONS  IN  SPACETIME  WITH  BROKEN  INTERNAL  SYMMETRIES .  This 
section  determines  the  relationship  between  the  measured  period  and  frequency 
from  the  experimental  observation  that  waves  and  vibrations  are  periodic  in  mea¬ 
sured  space  and  time  coordinates.  The  equation  describing  the  time  dependence 
of  a  periodic  phenomena  in  space  and  time  with  broken  internal  symmetries  is 
written  as  a  generalization  of  the  standard  scalar  equation1*-7 
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(8) 


2-  -2  2- 

d  I/dt  +51-0 


whose  solution  is 

i-ii® 

O 


where  t  is  given  by  equation  (1)  and  where 
3  -  (jje-^w  -  2ir/T  -  2it/Te-^0T 


and  therefore 


oj  -  2irf  -  2tt/T  9  -  -  9_ 

a)  T 


where  5  -  complex  number  angular  frequency  whose  magnitude  and  phase  are  oi  and 
9^  respectively,  and  T  -  complex  number  period  whose  magnitude  and  phase  are  T 
and  9-j  respectively. 

The  requirement  that  equation  (9)  represents  a  periodic  wave  in  measured 
time  implies  that  u>t  is  a  real  number.  This  can  be  seen  by  first  writing  u  and 
t  as  complex  numbers  as  follows 

uJ  *  ua  +  jw  *  0)(cos  9  +  j  sin  8  )  (12) 

K,  I  oj  co 

c  »  c  +  jtT  -  t(cos  9  +  j  sin  9  )  (13) 

R  i  £  C 


wt  -  u)RtR  -  0JItI  +  j(»jtR  +  Uj^tj)  (14) 

The  reality  condition  gives 

CI  ■  -  VrN  <15) 

and  therefore 

_  2  2 

wt  =  t^w  /a i  -  t  u»  /u  ®  cut  (16) 

R  R  m  m 

where  -  measured  angular  speed.  Therefore  the  reality  of  the  phase  wt 

requires  that  the  phase  be  linear  in  both  the  measured  time  tffl  and  the  time  mag¬ 
nitude  t  .  The  fact  that  the  phase  is  linear  in  the  measured  time  agrees  with 
the  experimental  fact  that  vibrating  systems  are  periodic  in  the  measured  time 
coordinate. 

The  measured  period  Tm  *  TR  -  T  cos  9>j  is  obtained  from  equations  (12)  and 
(16)  to  be 


T_u>  /u_  -  T_w/cos  9  -  wT  ■  2it 

a.  K.  K  oj 
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and  therefore 


2  2 

T_  *  2ir/u)  cos  0  *  2ir/to_  cos  0  ■  l/f_  cos  0  (18) 

R  uiRidRoj 

where  ■  aj^/(2ir)  -  real  part  of  the  frequency.  Therefore  the  relationship  be¬ 
tween  the  measured  period  and  the  measured  angular  frequency  is 

2 

T  *  2ir/ui  cos  0  ■  2it/(d  cos  0  ■  1/f  cos  0  (19) 

in  m  (si  cu  a) 

or  for  the  measured  frequency  and  the  measured  period 

f  -  1/T  cos2  0  -  1/T  cos2  0_  (20) 

m  m  (d  nz  T 

Note  that  fn  +  1/Tm  .  From  equations  (10),  (11)  and  (16)  it  follows  that 


9»  -  ef  -  -  8I  ■  -  6t 

(21) 

Periodicity  requires  the  internal  phase  of  the  frequency 
to  satisfy  equation  (21).  Combining  equations  (11),  (16) 

to  adjust  itself  so  as 
,  (19)  and  (21)  gives 

T  =  1/f 

(22) 

t/f  *  t  /f 
m  m 

(23) 

tf  *  t/T  »  t  /T 
m  m 

(24) 

Thus  the  phase  in  equation  (16)  can  be  written  as 

_  2 

uit  ■  ut  *  2irt  /T  *  to  t  /cos  0 
mm  mm  u 

and 


f  /T 
R7  R 


f  /T 
m  m 


(25) 

(26) 
(27) 


The  phase  cut  has  a  period  T  when  expressed  in  terms  of  t  ,  and  a  period  Tm 
when  expressed  in  terms  of  the  measured  time  tm  .  The  general  solution  of  equa¬ 
tion  (8)  is 


j  m  Jg42iftn/Ttn  +  §e~i2irtm/Tm 


(28) 


-  Aet2irt/T  +  Be’i2TTt/T 


The  conclusions  for  broken  symmetry  space  and  time  that  fm  and  Tm  are  related  by 
equation  (20')  and  that  fm  i4  1/Tm  may  possibly  be  experimentally  verified  if  fm 
and  Tm  can  be  independently  measured  for  the  same  periodic  phenomenon.  Finally, 
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the  conclusions  of  this  section  are  based  on  the  observed  fact  that  periodic 
physical  systems  have  definite  periods  in  measured  time. 

3.  CYLINDRICAL  WAVES  IN  ASYMMETRIC  MATTER.  The  wave  equation  for  cylin¬ 
drical  waves  in  matter  with  broken  internal  symmetry  is  written  as  a  generaliza¬ 
tion  of  the  standard  scalar  wave  equation  as  follows4”7 

32u/3r2  +  1/r  3u/3?  +  i/f2  32u/3^2  +  325/3z2  -  1/v2  32u/3t2  (29) 

where  u  -  wave  amplitude  with  internal  phase,  and  v  ■  complex  number  phase  ve¬ 
locity.  Equation  (29)  can  be  solved  by  the  standard  technique  of  separation  of 
variables4”7 

u  -  Z(J)4(i)g(r)f(t)  (30) 

which  gives  the  following  simple  complex  number  generalization  of  the  standard 
equations  for  cylindrical  waves4-7 

d2$/d$2  +  M2?  ■  0  (31) 

d2Z/dz2  +  k2Z  -  0  (32) 

z 

r2d2R/dr2  +  rdS/dr  +  (k2r2  -  M2)R  -  0  (33) 

d2E/dt2  +  52f  -  0  (34) 

where  M  »  constant,  and  k2  and  kr  are  constants  that  are  related  by4-7 
-?  -2  -2 

k  -  k  -  k  (35) 

z  r 

where  k  is  defined  by 


k  ■  u/v 


Using 


gives 


k  *  keJ 


k  *  w/v 


9.  -  9, ,  -  9 
k  w  v 


which  determines  k  and  9^  .  Writing  kz  and  k7  as 

(>  ■  tr  a 3  ®kZ  _  l»  X  X  I. 


k  *  k  eJ 
z  z 


k  _  +  jk  _ 
zR  J  zl 


k  p^  ^kr  m  ^ 
r  rR  J  rl 


595 


where  k  D  *  k  cos  9,  ,  k  T  ■  k  sin  0,  ,  k  ■  k  cos  0.  and 

zK  z  K.Z  z  J.  z  icz  rK  it  icir 

k  ,  -  k  sin  9.  ,  allows  equation  (35)  to  be  written  as 


v  T  -  ft 

rl  r 

2 


kr 

k2  cos(29kz)  -  k2  cos(29k)  -  k2  cos  (29^) 


k2  sin(29kz)  -  k2  sin(20k)  -  k2  sin(23kr) 


2 


(41) 

(42) 


Corresponding  to  the  phase  velocity  given  by  equation  (36)  the  complex  num¬ 
ber  group  velocity  is  given  by 


v  «v  e^v8  ■  dw/dk 
g  g 


(43) 


Therefore 


V  *  cos 
g 

Bk>k  [(dw/dk)2 

+ 

(id  d9  /dk)2]1/2 

0) 

(44) 

0 

-  0 

+  8  -  9.  - 

ft. 

,  -  9  +  ft  -  ft,  , 

(45) 

Vg 

ID 

ID, ID  k 

k, 

,k  v  u),u>  k,k 

at  d0y/dk 

tan 

B 

ID, ID 

dw/dk  “ 

d9 

IdiD 

ID 

(46) 

can 

9k,k 

*  kd6k/dk 

(47) 

The  solution  to  equations  (31)  through  (33)  will  now  be  considered. 

A.  Solution  of  <J>  Equation. 

Consider  now  the  solution  of  equation  (31)  and  the  determination  of  the  com¬ 
plex  number  constant  M  .  The  solution  of  equation  (31)  can  be  written  as 

5  -  Ae^  +  Be-15*  (48) 


It  will  now  be  shown  that  must  be  a  real  number  if  <J>  is  to  be  a  periodic 
function  of  Che  measured  azimuthal  angle  *  $  cos  9$  .  Writing  the  complex 
number  H  as 

M  »  Me^0M  «  +  jM  (49) 

allows  the  phase  M$  in  equation  (48)  to  be  written  as 

M*  -  -  Mjtj  +  j  (Mt$r  +  ^j)  (50) 

The  reality  of  M$  gives 

Vr  +  Vi  ’  0  <M) 

and 


(52) 


M*  -  M*  -  9RM2/MR  - 


For  a  periodicity  of  the  form  9(9R)  -  9(9R  +  2t r)  it  is  required  that 

M2/^  -  a  (53) 

where  m  -  positive  integer.  From  equation  (49)  it  follows  that 

^  *  M  cos  9^  *  M  cos  0^  (54) 

because  9M  *  *  9*  from  equation  (52).  Combining  equations  (53)  and  (54)  gives 


“R 


m  cos  0, 

9 

2  c 

1  m  cos  9. 

9 

■  -  m  cos  0, 
9 


sin  9, 
9 


(55A) 

(55B) 

(55C) 


as  the  condition  for  the  function  9  to  be  periodic  in  9R  with  period  2ir  .  There¬ 
fore  M  is  not  an  integer  and  equations  (48)  and  (52)  show  that  the  wave  ampli¬ 
tude  is  not  periodic  in  the  variable  9  .  Equation  (45)  can  be  written  as 


9  .  Aeim*R  +  Be-1”** 

which  is  periodic  in  9R  .  Note  that  M9  - 
in  equation  (31).  Traditionally  equation 
the  separation  constant,  but  for  waves  in 
ration  constant  is  the  complex  number  M  . 
M9  is 


(56) 

m9R  ,  and  that  M  is  a  complex  number 
(31)  accepts  only  integer  values  of 
asymmetric  space  and  time  the  sepa- 
The  reality  condition  on  the  phase 


9 


M 


(57) 


which  is  the  equation  for  evaluating  the  phase  angle  8M  .  The  internal  phase 
angle  of  the  magnetic  quantum  number  must  adjust  itself  in  such  a  way  that  equa¬ 
tion  (57)  is  valid  for  periodic  waves. 


B.  Solution  of  the  Z  Equation. 


Equation  (32)  has  the  following  formal  solution 
Z  *  Ce^z1  +  De'1^2 


(58) 


The  exponent  term  in  equation  (58)  can  be  written  as 
M  '  kzRzR  -  Klzl  *  J(kzl'R  +  k2R*I> 


(59) 


and  the  reality  requirement  for  the  exponent  term  in  internal  space  gives 


V  •  k2z  ■  !Rk2/k2R  ■  W”8  *k£ 


(60) 


and  which  also  gives  8Rz  ■  -  9Z  .  If  the  waves  propagate  in  the  z  direction 
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with  a  measured  spatial  wavelength  Lzm 


Lzr  “  Lz  cos  0Z  then  equation  (60)  gives 


HrV“8  9k2  ’ 2’ 

for  periodicity  with  wavelength  Lz£  .  Therefore 
kz  "  2it/LzR  C0S  9kz  ’  2WLz 

kzR  *  27r/LzR  COs2  9kz  *  2ir/Lz  C0S  9kz  "  kz  C0S  9kz 
The  reality  condition  on  kzz  shows  that 

9  -_0--0 

kz  z  Lz 

Equation  (63)  shows  that  kz^  t  2ir/LZ£  • 

The  solution  given  in  equation  (58)  can  be  rewritten  as 
Z  -  CeikzZ  +  De“ikzZ 


Cei2irzR/LzR  +  De“i27TZR/LzR 


(61) 

(62) 

(63) 

(64) 

(65) 


The  internal  phase  angle  of  the  wave  number  must  adjust  itself  to  the  local  bro¬ 
ken  symmetry  of  spacetime  such  that  equation  (64)  is  satisfied  for  periodic 
waves.  Equations  (58)  or  (65)  are  the  general  solutions  for  plane  waves  in_the 
z  direction  and  equation  (64)  holds  for  -  ■  <  g  <  •  .  It  is  possible  that  kz  is 
an  imaginary  number  in  real  space  so  that  kj  -  iicz  ,  then  the  solution  to  equa¬ 
tion  (32)  is  attenuating  in  nature  and  given  by 

Z  -  Ce<zZ  +  De“<zZ  (66) 


and  apparently  <zz  need  not  be  a  real  number  in  internal  space  because  there  is 
no  periodicity  requirement  in  the  z  direction  for  this  case. 

C.  Solution  of  the  8  Equation. 


The  radial  equation  (33)  is  similar,  to  the  standard  radial  equation  of  vi¬ 
bration  theory  except  that  R  ,  ?  ,  kr  and  M  are  complex  numbers.  The  formal  so¬ 
lution  to  equation  (33)  can  be  written  as  a  generalization  of  the  standard  re¬ 
sult  for  scalar  coordinates  as  follows**’7 

R  -  AJs(krf)  +  BNs(kfr)  (67) 

which  represent  standing  waves,  where  complex  number  Bessel  function  and 

■  Neumann  function  of  complex  order  M  .  The  progressive  wave  solutions  to 
equation  (33)  are**’7 

R  -  AH^0(krr)  +  3Ri:)(krf)  (68) 

where  and  H<-j2^  »  complex  number  Hankel  functions  of  the  first  and  second 
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kind  of  order  M  .  The  asymptotic  values  of  the  Hankel  functions  are  given  by 
the  following  generalization  of  the  standard  results4’7 

Hj^O^r)  *  [2/(irkrr)]1/2  ^(Er'-**"/2"^)  (69) 

H^2)(krr)  -v  [2/(irkrr)]1/2  e-i(krr-Mir/2-ir/4)  (70) 

If  equations  (69)  and  (70)  represent  in-going  and  out-going  waves  which  have  a 
periodicity  in  the  measured  radial  coordinate  r^  *  r  cos  9r  it  follows  that  the 
phase  krr  must  be  a  real  number  in  the  far  field,  with  r  -*■  <*>  ,  so  that 

kr?  '  V  "  rRkJ/krR  '  rRkr/coa  V  <71) 

9kr  '  "  6r(r  '  ’>  (72) 

where  k^  *  kj.  cos  . 

Let  the  waves  in  the  far  field  propagate  in  the  r  direction  with  a  measured 
spatial  wavelength  LrR  ■  Lr  cos  8r  where  Lr  •  intrinsic  spatial  wavelength  in 
the  r  direction.  Then  from  equation  (71) 

LrRkr/cos  0kr  -  2 it  (73) 

k^  -  27r/LrR  cos  9kr  -  27r/Lr  (74) 

krR  *  2ir/LrR  CQs2  9kr  *  2ll/Lr  cos  ®kr  “  kr  C0S  6kr  (75) 

so  that  k^  +  2ir/Lrj^  .  Equations  (71)  through  (75)  hold  only  in  the  far  field 
because  only  in  this  region  is  the  concept  of  the  wavelengths  Lr  and  Ltr  de¬ 
fined.  The  presence  of  periodic  waves  requires  the  broken  symmetry  of  the  wave¬ 
length  to  adjust  itself  so  as  to  satisfy  equation  (72)  at  large  distances  from 
the  source  of  the  waves.  In  the  far  field  of  asymmetric  waves  equations  (69) 
and  (70)  can  be  rewritten  as 

H^i)(krr)  -v  [2/(irkrr)]1/2  eikrr  (76) 

H^2)(krr)  -v  [2/(-ykrr)]l/2  e"ikrr  e177/2  W+4)  (77) 

These  solutions  represent  progressive  waves. 

As  a  special  case  consider  the  solution  of  standing  waves  in  a  vibrating 
membrane  located  in  3pace  and  time  with  broken  internal  symmetries.  The  solu¬ 
tion  of  the, wave  equation  for  this  case  is  an  obvious  generalization  of  the 
standard  results  for  scalar  quantities6 

u  ■  CJ-j(k^r)  cos(M0  +  a)  e*U,t  (78) 


CJjj(k  r)  cos (md  +  ai)  e 
nr  m 


where  for  a  membrane  kT  ■  k  .  The  small  argument  expansion  of  the  Bessel  func¬ 
tion  of  order  M  is  given  by  the  following  generalization  of  the  standard  scalar 
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results 


lS(krr)  -  (krr)M{l  -  kV/[4(M  +  1)]  +  kV/[32(M  +  1)(M  +  2)]  -  •••}  (80) 


Using  equations  (49)  and  (80)  gives 


JS«  (krr)Xe"y+jw{l  -  kV/[4(M  +  1)]  +  kV/[32(M  +  1)(M  +  2)]  -  •••}  (81) 


where 

2 

x  «  M  cos  0U  »  m  cos  0.,  (82) 

7  -  M(0kr  +  0r)sin  0M  (83) 

■  m(9kr  +  9r>sin  9m  coa  9h 

w  -  m(0kr  +  @r)cos2  0^  +  m  -6t(krr)  cos  9^  sin  0^  (84) 

The  case  M  ■  0  gives 

JQ  -  1  -  l /4k2?2  +  l/64k4?4  -  •••  (85) 

In  equation  (78)  and  (79)  kj?  is  not  a  real  number  because  the  vibrations  are 
not  periodic  in  the  radial  direction. 

4.  SPHERICAL  WAVES  IN  ASYMMETRIC  MATTER.  A  simple  generalization  of  the 
standard  wave  equation  for  spherical  waves  gives  the  following  equation  that  de¬ 
scribes  spherical  waves  in  space  and  time  with  broken  internal  symmetries  **" 7 

32u/3r2  +  2/r  3u/3r  +  1/ (r2  sin  j/)  3/3iJ/(sin  if  du/dtp)  (86) 

+  1/ (?2  sin2  f)32u/3$2  -  1/v2  32u/3t2 

Separating  the  complex  number  wave  amplitude  as 


u  *  R(r)W(ijj)0(i)S(t) 


gives 


(L  -  d2)d2W/du2  -  2u  dW/du  +  [C(E  +  I)  -  M2/(l  -  u2)]w  -  0 
r2d2§/dr2  +  2r  dR/dr  +  [k2r2  -  £(£  +  i)]S  -  0 

d2Wdf2  +  M2$  -  0 

2-  _2  2- 
dT/d t  +  w  C  -  0 


where  u  ”  cos  p  and  k  ■  I/v  . 
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A.  Solution  of  the  W  Equation. 

The  solution  of  equation  (88)  can  be  obtained  as  a  complex  number  general¬ 
ization  of  the  associated  Legendre  polynomials.7  The  complex  number  associated 
Legendre  polynomials  can  be  obtained  from  equation  (88)  by  writing7 

W  -  (1  -  u2)M/2f(C,M,u)  (92) 

where 

00 

f  -  l  csils  (93) 

s«o 

by  direct  substitution  one  finds 4-7 

c_  -  1/2[M(M  +  1)  -  Z(Z  +  1)  ]c  (94) 

L  O 

c3  -  L/6[(M  +  1)(M  +  2)  -  1(1  +  l)]c1  (95) 

£4  -  1/12[(M  +  2)(M  +  3)  -  C(C  +  l)]c2  (96) 

c5  -  1/20[(M  +  3)(M  +  4)  -  1(1  +  i)]c3  (97) 

The  following  is  the  complex  number  generalization  of  the  standard  scalar  re¬ 
sults4-7 


%+2/5v  -  O  +  M)(v  +  M  +  L)  -  1(1  +  l)]/[(v  +  l)(v  +  2)]  (98) 

where  v  ■  integer.  Break  off  polynomial  solutions  can  exist  even  when  M  and  Z 
are  complex  numbers  provided  that  they  are  related  by 

C  *  M  +  v  (99) 

where  the  integer  v  must  have  the  value 

v  -  l  -  m  (100) 

where  t  and  m  *  integer  separation  constants.  Combining  equations  (99)  and 
(100)  gives 

l  -  M  +  l  -  m  (101) 

Equation  (99)  reduces  to  equation  (100)  for  symmetric  spacetime. 

From  equation  (101)  it  follows  that 

C  cos  9r  ■  M  cos  9W  +  i  -  m  (102) 

L  M 

£  sin  9j-  *  >1  sin  9^  (103) 

From  equations  (102)  and  (103)  it  follows  that 
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(104) 


tan  9.  ■  (M  sin  9  )/ (M  cos  9.,  +  t  -  m) 
L  MM 

C  -  £[l  -  a/ 1(2  -  m/£)sin2  9j1/2 


(105) 


where  from  the  analysis  of  the  <t  equation  given  in  Section  3  it  follows  that 


M  ■  m  cos  9. 


9M  *  '  % 


(106) 

(107) 


The  complex  number  associated  Legendre  polynomial  solutions  can  then  be 
written  as 


w  » 

(1 

-  n2)a/2 

Z  •  M 

(108) 

w  » 

(1 

-  =2)a'2  5 

Z  -  M  +  1 

(109) 

w  - 

(1 

-  u2)S/2[(2M  +  3)u2  -  1] 

Z  -  M  +  2 

(110) 

w  - 

(1 

-  52)m/2[(2m  +5)B3  -  35] 

C  -  M  +  3 

(111) 

Note  that  formally  W  is  given  by  the  following  associated  Legendre  polynomials 

W  -  Pjj(H)  (112) 

B.  Solution  of  the  8  Equation 


The  solution  of  the  complex  number  radial  equation  (89)  can  be  obtained 
by  formal  analogy  to  the  solution  of  the  real  number  version  of  equation  (89) 
and  the  result  for  standing  waves  in  asymmetric  matter  is4-7 


£  ■  Aj  ji(kr)  +  Bn^(kr) 

while  for  progressive  waves  is  asymmetric  matter 11 
8  -  Ah^l)(kr)  +  Bh^2) (kr) 


(113) 


(114) 


where  jp(kr)  *  complex  number  spherical  Bessel  function  of  order  C  ,  nj»(kF)  *  com- 
**  _  /  1  \  _  “ 
plex  number  spherical  Neumann  function  of  order  Z  ,  h4  7 (kr)  -  complex  number 

^  .(2)  - 

spherical  Hankel  function  of  first  kind  of  order  C  and  hg  7  (kr )  *  complex  num¬ 
ber  spherical  Hankel  function  of  the  second  kind  of  order  C  .  These  functions 
are  defined  as  generalizations  of  the  corresponding  real  valued  functions  as 


follows 
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(115) 


jg(kr)  -  [ir/(2k?)]1//2  J^(kr) 

nj(kr)  -  [ir/(2k?)]1/2  N£^(kf)  (116) 

h^1}(kf)  -  [it/ (2kr) ]1/2[j^+j5(kr)  +  iN^kr)]  (117) 

h^2)(kr)  -  [ff/ (2kr) ] 1 /2[j^+J^(kr)  -  iN^kr)]  (118) 

The  asymptotic  expansions  derived  from  equations  (115)  through  (118)  are*4-7 


j^(kr)  -*■  l/(kr)sin(kr  -  Ci r/2) 
ng(kr)  -*■  1/ (kr)cos(kr  -  Cir/2) 

hg1^  (kr)  -  l/(kr)ei[^r"(£+1)lT/2] 

hi2)(kr)  i/cK).-1^^^21 


(119) 

(120) 

(121) 

(122) 


In  order  for  equations  (119)  through  (122)  to  describe  periodic  waves  in  the 
far  field  the  following  conditions  must  hold 


kr  »  kr 

9,  +  0  -  0 

k  r 


r  +  oo 


(123) 


Thus  for  instance  the  replacement  kr  •  kr  can  be  made  in  the  right  hand  sides 
of  equations  (119)  through  (122).  Therefore  the  internal  phase  angle  of  spher¬ 
ical  waves  in  the  far  field  must  adjust  itself  to  the  local  broken  symmetry  or 
space  such  that 

9k  -  -  0r(r  =  °°)  (124) 

5.  CONCLUSION.  Waves  propagating  in  matter  or  spacetime  with  broken  sym¬ 
metries  will  have  complex  number  separation  constants  H  and  C  .  For  spherical 
waves  the  separation  constants  must  be  related  by  Z  -  M  »  integer.  The  ob- 
sered  periodicities  of  the  waves  in  measured  time  and  measured  space  requires 
complex  number  separation  constants,  wave  numbers,  frequencies  and  coordinates. 
However  the  quantities  wt  ,  and  kr  must  be  real  numbers  for  periodic  waves. 
Applications  to  seismic  and  acoustic  waves  are  possible  because  the  earth's 
gravity  induces  a  broken  symmetry  in  the  coordinates. 
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ABSTRACT 


Two  applications  of  the  Front  Tracking  method  form  the  basis  of  this  paper. 

A  formulation  of  the  small  anisotropy  hypothesis  for  nonlinear  elastic 
deformation  is  given  which  is  hilly  rotationally  covariant  and  which  is  thermo¬ 
dynamically  consistent  in  the  sense  that  it  is  derived  from  a  specific  internal 
energy.  An  algorithm  for  the  solution  of  the  Riemann  problem  for  nonlinear 
elasticity  is  presented.  This  algorithm  uses  Godunov  type  iterations.  For  the 
uniaxial  deformations  of  an  isotropic  material,  the  Godunov  iterations  occur  in 
one  dimensional  spaces,  while  in  the  general  case,  the  iterations  are  at  most  in  a 
two  dimensional  space. 

Slug  flow  is  studied  in  the  context  of  Hele-Shaw  cells.  The  transition  from 
laminar  to  slug  flow  is  the  main  object  of  study. 
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1.  Introduction. 


The  nonlinear  deformations  of  an  elastic  body  are  described  by  a  hyperbolic  system  of 
quasilinear  equations.  The  constitutive  relations  close  the  system  and  can  be  characterized 

through  the  dependence  of  the  specific  internal  energy  W  on  the  strain  tensor  E  =  - /). 

For  hyperelastic  materials  these  relations  are  the  stress-strain  relations:  in  Eulerian  coordinates, 
rfJ  =  p [8,9],  where  (a*7)  is  the  Cauchy  stress  tensor,/*.  =  p0-lF‘  the  Eulerian  defor- 

dfa 

mation  gradient  and  p0  the  material  density. 


For  deformations  small  in  shear,  we  model  the  specific  internal  energy  in  terms  of  the  strain 
tensor  E  (or  the  deformation  gradient  F)  by  a  third  order  approximation  on  the  effective  shear 
strain  e  [3], 


W{F,S)  =  W{E,S )  =  W,(y,S )  +  —  G0(y,  S)  e2  ,  (1.1) 

Po 


where  5  is  the  entropy,  ya(l/3)tr£  is  the  mean  compressive  strain  and 
e2  a  (dev  E)ij  (dev  £),;.  G ofy,  S)  is  the  shear  modulus  at  e  =  0  for  the  corresponding  hydrostatic 
strain  y  and  Wr  is  a  hydrostatic  energy.  Here,  y,  e2  and  co3  s  det(dev  E)  form  a  complete  set  of 
invariants  for  the  strain  tensor  E,  with  the  third  invariant  to3  satisfying  co3=0(e3)  [3]. 

The  solution  waves  in  elasticity  are  of  three  types:  predominately  longitudinal  (or  pressure), 
predominately  transverse  (or  shear)  and  a  thermo-contact.  In  the  non-linear  case,  the  shear  waves 
split  in  two  modes:  radial  and  angular  shear,  while  in  the  linear  case  the  two  shear  waves  speeds 
coincide.  The  elastic  system  expressed  in  Eulerian  conserved  variables,  for  a  uniaxial  deforma¬ 
tion,  is  given  in  terms  of  the  fundamental  variables  p  (density),  (velocity),  <j  (Cauchy  stress 
vector),  /^(Eulerian  deformation  vector)  and  e  =  W($,  S)  (specific  internal  energy)  [9], 


fp  +  £(pv‘)  =  0. 

(1.2a) 

£(pm  jy*’)  =  £<v/') 

II 

K) 

UJ 

(1.2b) 

i(pv,)+i(p,'v')  =  i(°‘) 

ir  1.2.3  , 

(1.2c) 

i-jV.-V*  +«)  +■“  P^-V.-V*  +tf)V‘ 

=  £(V'°‘)- 

( 1 .2d) 
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Here  j f  and  ~a  arc  defined  by  /l  ~f\  and  &  =  &{.  The  density  satisfies  the  relation 
p  =  p0  =  (fl  )_1 ,  with  J  =  det  F. 


Fig.  1.1  The  elastic  Riemann  solution 


The  elastic  state  is  written  a s  U  =  with  v*=  (w,  v,  w )  and  a  -  ( c,  x,  9 ),  where 

o=al ,  cr  =  xcos0  and  cr3  =  t  sin0.  The  elastic  Riemann  solution  consists  of  eight  constant  states 
U‘a,  a  =  /,  r,  i  =  0,  1,  2,  3  separated  by  seven  waves  (see  Figure  1).  Of  these  waves,  the  mid¬ 
dle  one  is  a-slip  line  of  speed  X  =  uj  =  uj  and  the  remaining  waves  come  in  pairs.  The  fast 
waves  are  mainly  longitudinal,  in  the  sense  that  the  change  in  the  deformation  vector  occurs 
mainly  in  the  direction  of  propagation  of  the  waves.  The  other  two  pairs  are  mainly  transverse. 
The  slower  of  the  transverse  waves  correspond  to  necking  while  the  faster  of  the  transverse  waves 
are  linearly  degenerate  and  correspond  to  torque  [4]. 
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2.  Numerical  Solution 


We  relate  the  Riemann  problem  in  elasticity  to  gas  dynamics,  where  the  longitudinal  waves 
are  pressure  waves  and  the  shear  wave  is  a  contact  of  zero  speed.  Using  this  perspective  we 
employ  a  Godunov-type  iteration  [2,6]  to  solve  the  Riemann  problem.  Each  step  of  the  iteration 
is  divided  in  two  blocks:  in  the  first  block  there  are  only  pressure  waves  and  in  the  second  we 
solve  both  shear  modes  simultaneously.  The  importance  of  this  decomposition  is  that  it  allows  a 
reduction  in  the  dimension  of  the  space  in  which  the  Godunov  iteration  operates;  in  the  most 
favorable  case,  each  iteration  occurs  in  a  one  dimensional  space,  while  the  general  case  (non  iso¬ 
tropic  or  non  uniaxial  intermediate  deformation)  involves  at  most  a  two  dimensional  iteration. 
This  is  in  contrast  to  the  seven  dimensions  of  the  state  space  [4], 

The  first  block  consists  of  the  1-waves.  In  analogy  to  gas  dynamics,  we  consider  the  shear 
and  thermal  waves  as  a  "wide"  slip  line  with  "surface  tension"  that  causes  a  prescribed  jump  in  a 
and  u.  This  slip  line  lies  between  the  two  pressure  waves.  We  solve  the  1 -waves  for  o  and  u. 
while  x,  9,  v  and  w  are  left  free.  The  change  of  c  and  u  across  this  line  (shear  waves)  is  given  by 
two  parameters  Aa  =  a*  -  a,1  and  Au  =  ulr  -  u} .  The  Riemann  problem,  formulated  in  this 
manner,  consists  only  of  two  waves.  We  use  a  Godunov  iteration  method  to  evaluate  aj  and  uj . 
Finally  the  values  of  determine  the  complete  states  U{a,  for  a  =  l,r. 

The  second  block  consists  of  shear  and  thermal  waves.  We  solve  these  waves  for  the  vari¬ 
ables  x,  9.  v  and  w,  with  the  conditions 

dj  =  df  ,  w3r  =  wf  ,  V?  =  V/3  ,  X?  =  X t, 

while  o  and  u  are  free  variables.  For  isotropic  materials  and  uniaxial  left  and  right  states,  the 
degeneracy  of  the  2-waves  allows  us  to  solve  these  waves  explicitly.  We  use  a  Godunov  type 
iteration  method  for  the  remaining  shear  waves. 

Proceeding  in  this  fashion,  <j  and  u  will  have  a  jump  across  the  slip  line  and  the  difference 
will  depend  on  the  initial  parameters  A  a  and  Au.  Thus,  we  define  the  function 


AO 

o?  -  <S3i 

Au 

3  3 

ui-uf 
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A  solution  to  the  Riemann  problem  will  be  given  by  the  parameters  Ac  and  Am  such  that 

f[  =  0 .  If  we  write 
l  Am 


iz) 

s 

‘  Ac] 
Am  J 

f - > 

S3 

V _ / 

i*. 

l 

(2.2) 

we  can  construct  the  iteration.  Given 

AO 
.  Am. 

1". 

let 

'  Ac](,,+1)  : 
.  Am  J 

=  Q 

Ac 
.  Am. 

J  (  ,  for  n  >  0  . 

(2.3) 

From  equation  (2.3),  the  zeros  of  F  correspond  to  fixed  points  of  Q. 


3.  Front  Tracking  in  Elasticity. 

left  torque  wave  tracked  left  torque  wave  tracked 


Fig.  3.1 

In  Figure  3.1  we  compare  four  computational  solutions  for  the  same  elastic  Riemann  prob¬ 
lem.  The  main  point  of  this  comparison  is  to  illustrate  the  advantage  (or  necessity)  of  tracking 
the  torque  wave.  The  variable  shown  is  the  conserved  quantity  p/3 ,  plotted  at  approximately  the 


709 


same  physical  time  t  =  3.17.  We  use  the  Riemann  solver  described  in  the  previous  section  in 
conjunction  with  a  higher  order  Godunov  scheme  [7]  in  a  form  proposed  and  designed  by  I-L. 
Chem  [1].  The  top  two  frames  used  grids  of  50  and  100  points  respectively,  and  an  algorithm 
which  tracked  the  left  2-wave  (torque).  The  bottom  frames  correspond  to  runs  with  grids  of  500 
and  100  points,  where  no  waves  are  tracked.  We  observe  that  the  stiffness  of  the  material  pro¬ 
duces  two  shear  waves  of  similar  and  slightly  different  speeds.  This,  together  with  numerical  dif¬ 
fusion  on  the  linear  wave,  "hides"  the  intermediate  state  between  the  shear  waves  producing  a 
single  shear  wave  in  the  untracked  computation.  For  finer  grids  (500  points)  we  observe  a  "rip¬ 
ple",  where  the  torque  wave  should  appear,  which  is  still  negligible  compared  with  the  size  of  the 
actual  discontinuity  (see  Figure  3.2).  Note  that  in  Fig  3.2,  15  mesh  ceils  are  used  within  the  rip¬ 
ple  region,  but  the  waves  are  still  under  resolved.  Tracking  of  the  linearly  degenerate  wave,  on 
the  other  hand,  forces  the  resolution  of  the  shear  waves  independently,  thus  preserving  the  wave 
structure  of  the  solution,  regardless  of  the  size  of  the  grid. 


Untracked  Fine  Grid 


4.  Umbilic  Points 


For  isotropic  materials,  the  dependence  of  W  onj  is  through  /=/'  and  g2  =  (f2)2  +  (/3)2. 
In  the  previous  section,  for  these  materials,  we  have  assumed  that  the  eigenvalues  associated  to 
the  nonlinear  pressure  and  necking  waves  (Xj  and  A3  resp.,  near  the  reference  state)  and  the 
linearly  degenerate  torque  waves  (A2)  are  ordered  as  At  ^  A2  >  A3.  A  potential  difficulty  arises  at 
states  for  which  the  two  shear  waves  cross  over  or  the  eigenvalues  associated  to  the  nonlinear 
waves  (pressure  and  necking)  coincide.  These  states  are  called  umbilic  points.  It  follows  from 
the  symmetry  of  W(f,  g2)  on  g,  that  the  line  g  =  0  is  a  locus  of  umbilic  points;  A2  =  A(  or  A3. 
These  points  are  double  (either  A2  =  A;  or  A^  =  A3)  or  triple  (A]  =  A2  =  A3).  We  refer  to  the  dou¬ 
ble  points  as  shear  umbilic  points  and  to  the  triple  points  as  nonlinear  umbilic  points  or  simply 
umbilic  points.  Near  the  undeformed  state,  the  pressure  waves  are  always  faster  than  the  shear 
waves,  but  this  ordering  is  reversed  when  crossing  an  umbilic  point.  Therefore  the  double  points 
on  the  umbilic  line  g  =  0  correspond  always  to  the  two  shear  waves  speeds  coinciding,  thus  the 
name  shear  umbilic  points.  The  shear  umbilic  points  have  a  fairly  simple  mathematical  structure. 
This  is  reflected,  for  example  in  the  fact  that  the  torque  waves  are  linearly  degenerate  waves  (con¬ 
tact  discontinuities)  across  which  only  the  angle  9  changes. 

To  study  the  occurrence  of  umbilic  points,  we  have  studied  the  small  shear  model  of  the 
specific  internal  energy  for  several  common  materials  (aluminum,  copper,  lead,  platinum.  ...)  and 
searched  for  the  coinciding  eigenvalues.  We  observe  that  the  umbilic  points  occur  only  on  the 
line  g  =  0,  the  ordering  of  the  waves  prevails  and,  furthermore,  the  points  for  which  A[  =  A3  is 
satisfied,  lie  outside  of  the  elastic  region.  They  lie  in  the  region  of  plastic  compression,  and  are 
well  within  experimentally  accessible  limits. 

We  now  describe  the  small  shear  constitutive  law.  In  the  formulation 


W(F,  S)  =  W,( y,  5)  +  —  G0(Y,  5)  e2  , 

Po 


(4.1) 


the  hydrostatic  energy  Wf  is  given  by  a  stiffened  gamma  law, 

W,(V.S)  =  v-^’expf-^S) + 

1  —1  K 


(4.2) 


where  V  and  y  are  related  by 


V 

Vo 

v.  J 


=  (2y+  l)3.  The  shear  modulus  is  taken  from  Steinberg- 
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Cochran-Guinan’s  formulation 

G(V,  S)  =  G0  l-MPrf1'3  +5(7-300)  (4.3) 

(G0,  A  and  5  are  tabulated  constants  and  rj  is  the  compression,  rj  =  p/p0).  Here,  the  pressure  P 
and  temperature  T  are  considered  to  be  hydrostatic  quantities;  i.e.,  P  =  -d(W,)/dV  and 
T  =  d(Wf)JdS. 

To  describe  the  elastic  region,  we  consider  Steinberg-Cochran-Guinan’s  expression  [10]  for 
the  yield  strength  Y,  in  the  Von  Mises  sense, 

Y(Y,  S)  =  -g-  K0  [  1  +  P(e  +  e,-]  "  (4.4) 

Go  L  J 

n 

with  the  constraint  Y 0  1  +  (3(e  +  e.)  <  . 

The  coefficients  Go,  A,  B,  Y0,  YmiX,  (3,  n  and  T  are  obtained  from  Steinberg-Cochran- 
Guinan[10].  (See  Table  1.) 

We  calculate  the  simple  uniaxial  compression  needed  to  reach  the  umbilic  point  (X3  =  a.,  )  at 
constant  entropy.  For  the  materials  we  consider  (see  Table  2),  the  umbilical  points  lie  on  the  line 
S  =  0  and  the  compression  values  vary  from  8%  to  14.5%. 

At  the  same  time,  the  elastic  region  is  determined  by  the  effective  shear  stress  xe,  from  the 
inequality  (Von  Mises)  3  (xe)2  <2  Y2.  From  the  expression  for  the  internal  energy  ( 1 . 1),  we  have 

(Te)2  =  4p5  e2=4GV  , 

l  a<£  >  J 


1  yr  ^ 

and  therefore  e2S—  ~  ^  ~~r 


1  Yn 


12  G 


12  G  r 


.  This  relation  provides  bounds  for  the  elastic 


uniaxial  compression  r|  =  J  1  =  (detF)-1 .  We  see  that  the  bounds  are  less  than  1.3%  in  compres¬ 


sion  for  a  number  of  common  metals  (see  Table  2).  This  shows  that  the  umbilical  point  is  outside 


the  elastic  region. 
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Material 
Po  (gr/cc) 

M  (gr/(gr  moles)) 
/».  (kbar) 

G  o  (Mbar) 

A  (Mbar~l) 

-B  (kK-1) 

r 

Y0  (Mbar) 
L  (Mbar) 


Al 

Au 

Cu 

Pb 

Pt 

W 

2.702 

18.88 

8.92 

115437 

21.45 

19.35 

26.98 

196.967 

63.546 

207.19 

195.09 

183.85 

2.4970 

23901 

35005 

1.36474 

2.7417 

2.6316 

.276 

.28 

.477 

.086 

.637 

1.6 

6.5 

3.8 

2.8 

11.6 

25 

.94 

.62 

.31 

.38 

.14 

2.97 

3.99 

3.02 

2.67 

.0029 

.0002 

.0012 

.00008 

.0003 

.022 

.0068 

.00225 

.0064 

.001 

.0034 

.04 

Table  1.  Material  constants  used  to  define  the  constitutive  law  for  a  number  of  common  metals. 


Material 

Aluminum 

Gold 

Copper 

Lead 

Platinum 

Tungsten  > 

min.  compr. 

.987904 

.996006 

.993358 

.994236 

.997342 

.987730 

max.  compr. 

1.012551 

1.004042 

1.006777 

1.005865 

1.002679 

1.012739 

umbilic  compr.  | 

1.117324 

1.127150 

1.130208 

1.080368 

1.145728 

1.144691 

press,  (kbars) 

.8944 

.3245 

1.4207 

.  .43235 

1.621 

1.0278  1 

I 

Table  2.  The  minimum  and  maximum  uniaxial  compression  for  the  elastic  region  is  tabulated 
for  common  metals  along  with  the  compression  and  pressure  at  the  umbilic  point.  The  main 
point  of  this  table  is  that  the  umbilic  point  occurs  in  uniaxial  compression  at  experimentally  at¬ 
tainable  pressures,  within  the  plastic  range,  for  a  number  of  common  metals. 


5.  Slug  Flow 

The  instability  of  the  interface  of  two  viscous  fluids  has  been  extensively  studied.  The 
Taylor-Saffman  problem  is  the  most  well  known  example,  which  concerns  a  finger  growing  on  an 
interface  between  two  fluids  in  a  narrow,  two  dimensional  channel;  i.e.,  a  Hele-Shaw  cell.  The 
equations  for  a  two  fluid  Hele-Shaw  flow  can  be  written  as 

v=-^r1VP,  is  1.2.  (5.1) 

V-v  =  0  , 

where  v,  (i,  and  P  are  velocity,  viscosities  and  pressure.  The  indices  t  =  1,2  refer  to  the  two 
fluids.  In  contrast  to  the  classical  Taylor-Saffman  problem,  we  have  studied  the  reversal  of 
fingering  stability  caused  by  a  strong  transverse  flow  field,  see  Figure  5.1.  Fingers  growing  into  a 
strong  transverse  flow  have  qualitative  features  not  present  in  the  classical  Taylor-Saffman  insta¬ 
bility,  namely  narrow  fingers  are  produced  in  the  Taylor-Saffman  stable  case,  while  wide  fingers 
or  nonfingering  behavior  occurs  in  the  Taylor-Saffman  unstable  case  see  Figure  5.2. 

We  consider  a  geometry  defined  by  a  rectangle  with  no  flow  boundary  conditions  at 
the  top  and  bottom  and  prescribed  inflow  boundary  through  the  left  edge,  in  the  form  of 
distinct  channels  of  two  distinct  fluids,  see  Figure  5.  la. 

The  basic  physical  properties  of  Hele-Shaw  flow,  in  the  flow  geometry  of  Figure  5.1a,  are 
determined  by  three  dimensionless  parameters,  namely  the  ratio  of  the  inflow  velocities 
V  =  v2 /vj,  mobility  ratio  M=\i2/\L\  and  width  ratio  W  =  l2/(l {  +l2)  of  the  two  channels. 
When  MV  *  1  in  the  two  channels,  the  flow  depicted  in  Figure  5.1a  is  not  in  equilibrium  and 
fingering  may  result.  To  better  understand  the  initiation  of  fingering,  we  write  v  =  v'  +  (v  -  v') 
where  uniform  field  v'  satisfies  MV'  =  1  and  v-v'  has  zero  outflow  right  boundary  conditions. 
See  Figure  5.1b.  Two  time  scales  are  essentially  important  to  flow  patterns:  the  time  ts,  which 
characterizes  the  transport  of  the  finger  down  stream,  and  the  pinch  off  time  tp,  which  is  the 
characteristic  time  for  a  single  finger  growing  to  the  height  of  the  channel  width.  The  ratio  (3.  of 
the  two  time  scales,  is  a  function  of  V,  M  and  W.  (3  can  be  used  as  an  order  parameter  to  classify 
the  distinct  flow  regimes  and  the  transition  between  them.  When  (3  >  1,  pinch  off  occurs  repeat¬ 
edly,  and  turns  a  laminar  flow  to  slug  flow,  see  Figure  5.3.a.  Even  if  the  inflow  boundary  condi¬ 
tion  for  Hele-Shaw  flow  is  set  to  its  equilibrium  value,  in  a  variable  flow  channel  the  instability 
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Figure  5.1.  The  central  channel  is  solvent  located  between  two  layers  of  oil.  (a)  The 
velocity  field  of  the  total  flow  v.  The  injection  rate  on  the  right  boundary  satisfies 
MV  =  1.  The  left  boundary  velocity  ratio  V  is  larger  than  the  equilibrium  value,  i.e. 
MV  >  1,  which  results  in  excess  in  flow  of  oil.  (b)  The  velocity  field  of  the  nonuniform 
flow  v  -  v' .  There  is  no  flow  across  the  right  boundary.  The  excess  oil  in  flow  produces 
the  transverse  field  which  drives  the  solvent  flow  backward  at  the  inlet. 

develops  locally,  which  also  can  produce  a  laminar  to  slug  flow  transition,  see  Figure  5.3. b. 

A  detailed  presentation  of  the  results  of  this  section  will  be  given  separately  [5]. 
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Figure  5.2.  Finger  growing  on  the  interface  of  two  viscous  fluids,  where  p.2  >  p.i.  (a)  Ex¬ 
cess  flow  of  fluid  of  42.  Here  V  =  1,  M  =  5  and  W  =  0.4.  (b)  Excess  flow  of  fluid  of  |ii. 
Here  V  =  1,  M  =  40  and  W  =  0.5.  The  mesh  size  for  cases  (a)  and  (b)  are  30  x  10  and 
41  x  17  for  the  hyperbolic  equation,  60  x  30  and  70  x  30  for  the  elliptic  equation. 


Figure  5.3.  (a)  Consecutive  pinch  off  turns  a  laminar  flow  into  slug  flow.  Here  V  =  1, 
<Vf  =  5  and  W  =  0.92.  The  mesh  size  is  42  x  14  for  the  hyperbolic  equation,  84  x  30  and 
for  the  elliptic  equation,  (b)  The  transition  of  laminar  to  slug  flow  in  a  channel  of  vari¬ 
able  width. 
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ABSTRACT 

Synthesis  of  linear  antenna  arrays  can  be  formulated  in  terms  of  an  integral  equation  of  the 
first  kind  by  considering  a  linear  array  as  radiation  from  a  line  source.  This  integral  equation  is 
a  Fredholm  equation  of  the  first  kind  which  is  difficult  to  solve  numerically  by  straightforward 
methods.  The  difficulty  is  overcome  by  exploiting  the  pattern  theorem  of  T.T.  Taylor,  using  an 
iterative  procedure  to  refine  Taylor's  analytical  solution.  This  numerical  method  can  be  used 
to  tailor  the  beam  so  that  a  number  of  small  sidelobes  are  of  equal  size  and  a  few  isolated  nulls 
can  be  forced  over  certain  regions.  The  method  is  illustrated  for  one  and  two  dimensional 
apertures. 


INTRODUCTION 

One  definition  of  an  optimum  antenna  is  a  design  in  which  all  the  sidelobes  are  of  equal 
level.  However  this  is  impossible  since  it  would  involve  infinite  energy,  because  of  Parsval’s 
theorem.  The  best  one  can  hope  for  is  a  design  in  which  a  finite  number  of  sidelobes  are  of 
equal  level,  and  the  level  of  the  side  sidelobes  tend  to  zero  at  infinity. 

Dolph  [1]  has  derived  the  optimum  current  distribution  for  linear  arrays  with  a  finite 
number  of  elements  using  the  properties  of  Tchebycheff  polynomials.  Van  Der  Maas  [2]  has 
obtained  a  simple  asymptotic  formula  for  the  space  factor  when  the  number  of  radiating  ele¬ 
ments  becomes  very  large.  Using  this  formula  and  analytic  properties  of  the  space  factor,  Tay¬ 
lor  [3]  has  shown  that  this  optimum  or  ideal  space  factor  is  not  realizable  due  to  the  singular 
behavior  of  the  current  distribution  near  the  ends  of  the  aperture.  Taylor  then  gave  a  practical 
method  to  avoid  such  singularities,  and  obtain  what  are  termed  Taylor  weights.  This  widely 
used  method,  provides  practical  weights  for  linear  arrays. 

The  basic  idea  behind  Taylor's  method  is  that  of  bringing  a  selected  number  of  zeros  in  the 
space  factor  closer  to  the  center  of  the  visible  region  and  preserve  the  zeros  in  the  far  region  at 
integer  values.  This  is  accomplished  through  the  use  of  Woodwards  synthesis  procedure  (4). 
The  one-dimensional  space  factor  is  then  given  analytically  in  the  compact  form.  When  this 
method  is  used,  the  sidelobes  are  no  longer  of  equal  magnitudes,  as  is  the  case  with  the 
Dolph-Tchebycheff  method.  Instead,  they  decay  slowly  from  the  designed  sidelobe  ratio. 

In  this  paper,  rather  than  use  an  analytical  method,  the  solution  of  the  integral  equation  is 
reduced  to  a  set  of  algebraic  equations,  using  the  Woodward  synthesis  techniques.  The  illumi¬ 
nation  function  (i.e.  current  distribution),  as  well  as  the  pattern,  is  easily  computed.  The  results 
are  identical  to  those  of  Taylor.  As  in  Taylor's  result,  however,  the  sidelobes  are  not  of  equal 
magnitude.  We  then  iterate  around  the  zeros  to  correct  the  problem.  After  only  a  few  itera¬ 
tions  the  desired  zeros,  and  sidelobes  of  equal  amplitude,  are  obtained. 

The  problem  is  then  formulated  for  specified  levels  of  sidelobes,  and  the  method  illustrated 
by  some  numerical  examples.  The  generalization  to  an  arbitrary  set  of  nulls  or  near  nulls  <  i.e. 
low  sidelobes)  then  becomes  clear. 

The  integral  equation  approach  to  antenna  synthesis  assumes  a  continuous  distribution  of 
radiators.  In  practice  this  will  be  discretized.  The  method  illustrated  here  will  help  understand¬ 
ing  of  the  discretization  procedure  for  periodic  as  well  as  aperiodic  (unequally  spaced)  arravs, 
array  thinning,  and  sub-arraying  of  large  array  synthesis. 
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APERTURE  INTEGRAL  EQUATION 


In  this  section  we  formulate  the  problem  of  the  one  dimensional  aperture  problem  using 
the  Hertzian  vector  potential  [5].  Although  the  method  is  quite  general,  a  few  simplifying 
assumptions  are  made,  namely:  the  one-dimensional  approximation,  the  usual  antenna  approx¬ 
imation  neglecting  higher  order  terms,  and  finally  considering  a  monopole  rather  than  dipole 
or  any  other  elemental  characteristics  of  the  antenna  involved.  The  method  can  be  extended  to 
more  general  cases  as  desired. 

Consider  a  linear  oscillator  source  of  dimensions  as  shown  in  figure  1.  The  Hertzian  vector 
T is  given  by  [5]  (note  irx  *  xy  =  0 ). 


i  e 


•i  wl 


4  x  e  oj 


)  *:  at 


where  u(£)  is  the  current  distribution  (illumination  function)  along  the  one-dimensional  ele¬ 
ment.  Using  the  antenna  approximation,  the  above  reduces  to:* 


=  i 


g  -  /  ui  i  *  i  k  R 

4  x  e  w  R 


2 

/«(©••'  *« 

'  2 


(1) 


Using  equation  (1)  the  field  can  be  computed  as  follows: 


E  =  V(V  •  x)  +  k1  7, 

H  =  -  i  u  e  V  x  7 

One  term  approximation  of  pertinent  quantities  are  then  given  by: 

k  R  -i  v  t 


(2) 


i  Ho  w  e 


sin(0) 


4x1? 


/  *  «  00,9  u($  d$. 


(3) 


to 


Equation  (3)  illustrates  the  well-known  principle  of  multiplication  of  elemental  patterns  and 
array  factors  [6].  We  further  simplify  equation  (3)  to  a  monopole  case. 


- 


i  to  w  e 


i  k  R  •  <  w  (  2 


4  x  R 


je.ikia«9> 


(*) 


For  a  desired  asymptotic  electrical  field,  the  above  equation,  (4),  gives  the  integral  equation 
for  the  current  distribution  on  the  line  source.  After  normalization  the  integral  equation 


*  Thu  procedure  u  similar  to  Huygens  source  approximation. 
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becomes: 


S(u)  ■  k  fe‘kux  G(x ) dx  v  **  <x&  8,  -a  < x  <  a  (5) 


where  a =1/2,  k=2  t  /  A,  and 


i  iMs  c  e 


i  k  r  -  /  w( 


4  t  r 


S(y) 


Equation  (5)  is  the  main  integral  equation  to  be  worked  with.  It  has  been  derived  here  in  a 
simple  manner.  Alternatively,  it  may  be  derived  by  considering  discrete  isotropic  elements  and 
approaching  the  continuum  limit  [7]. 


NUMERICAL  PROCEDURE  ■ 

Consider  a  symmetric  function,  G(x).  Equation  (5)  then  reduces  to: 

a 

2  k  /G(r)  cos(  2  - - — )  dx  =  S(costf),  For  0  <  0  <  r  (6) 

a  •* 


where  G(x)  is  related  to  the  excitation  or  illumination  function  in  the  aperture  and  k=2a-  /  A, 
with  A  being  the  wavelength.  Introducing  p  *nx  /  a,  u  =  2a  cos  8  /  X  ,  g(p)=G(x), 

F(u  )  =  -~S(y),  with  the  visible  range  -2a  /  A  <  a  <  2a /X,  equation  (6)  becomes: 

2a 


fg(p)  cos (p  u)dp  »  «>0  (7) 

o  L 

Once  g(p)  is  known,  F(u)  is  defined  for  all  u  by  equation  (7).  In  our  case,  however,  we  have 
to  solve  for  g(p).  The  only  information  we  have  is  asymptotic  zeros  of  F(u).  These  are  derived 
by  Taylor  [4], [8]  and  obtained  by  the  asymptotic  method  in  appendix  AJL  It  is  clear  from 
appendix  equation  (Al)  that  to  avoid  singularities  in  the  unknown  function,  we  should  place 
the  asymptotic  zeros  at  the  integer  points,  i.e.  a  ■  0  for  the  form  of  g(p)  near  the  end  points  of 
the  aperture,  corresponding  to  a  pedestal  illumination.  We  further  assume  g(p)  is  infinitely 
differentiable,  finite,  and  non-zero  at  p=r. 

Following  Woodward’s  synthesis  [3]  let  g(p)  have  the  Fourier  expansion: 

gip)  -  -TT-  +  EDm  «»s(m  p)  (8) 

L  m-i 

for  0<  p  <  tr  otherwise  g(p)  ■  0  for  p  >  *• 


Substituting  equation  (8)  into  equation  (7)  we  have: 

f(«) .  2 ,  (p°  .  £d.  .  ■K" ,(“•”»» 

*  U  T  tM  1-  m)  «■  (u  -m) 


(9) 


From  (9)  we  have: 
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(10) 


Denoting  F(m)  by  Fm,  we  have  from  equation  (8)  and  (10): 


g(p )  *  T~(F0  +  2  COS(m  P)) 

4  T  m-1 


(11) 


Suppose  that  we  decide  that  our  far  field  pattern  F(u)  should  have  zeros  at  u=m  for  m  >  M, 
then  :* 


g(p )  -  -r~(F o  +  2  «»("*  />)) 

4  T  rTT-1 


(12) 


This  gives: 


F(u)  *  a0(«)  F0  + 


Af 

EF„ 

m  “  1 


«*(«) 


(13) 


where  ao(u)  *  sin(icu)  /  m 


am(u)  = 


sin(y  (u  +  m)) 
*■  (u  +  m) 


sinpr  (u  - m )) 
r  (ft -m) 


(14) 


From  (13)  and  (14)  we  have  F(n)  =*0  for  n  >  M,  and  hence  the  Taylor's  synthesis  theorem  is 
satisfied  for  finite  value  of  the  illumination  function.** 

We  normalize  so  that  F(0) *Fo  *  1.  Then  we  have  M  unknowns,  Fu  F±  ...JFM  at  our 
disposal  to  control  the  size  of  the  sidelobes  and  the  null  points.  Also  the  knowledge  of  F,  gives 
g(p)  from  equation  (12).  We  now  set  up  M  simultaneous  linear  algebraic  equations  for  the 
determination  of  the  F-,.  It  is  convenient  to  discuss  this  in  three  stages. 


1.  M  nulls  are  prescribed.  Then  the  sidelobes  will  be  completely  determined. 

2.  The  maximum  sidelobe  size  is  prescribed.  We  will  determine  the  solution  by  assuming 
that  the  maximum  value  of  each  sidelobe  equals  this  value,  for  the  first  M  sidelobes. 
Then  the  position  of  the  first  M  nulls  will  be  determined. 

3.  The  general  case  when  we  prescribe  the  position  of  k  near-nulls  and  the  maximum 
(equal)  size  of  the  M-k  sidelobes.  Then  the  remaining  M-k  near-nulls  and  the  other  k 
(maximum)  sidelobe  sites  will  be  determined. 

Case  1 

M  nulls  Ui  are  prescribed.  The  advantage  of  starting  with  this  case  is  that  we  can  check 

whether  the  proposed  method  is  going  to  work  by  solving  the  Taylor  problem  illustrated  in  this 

paper.  From  (13),  recalling  that  Fo  m  1,  and  setting  u*u<  (known)  for  i»  L,..^M  we  find: 

M 

Sa«(“t)  Frn  3  -00  «f  (15) 

m  ■  1 

These  are  M  simultaneous  linear  equations  for  Fm.  Now  see  how  we  can  solve  the  Taylor 


•  M  corresponds  to  JT  •  l  in  Taylor's  paper. 

From  here  on  our  method  differs  considerably  hem  that  of  T.  T.  Taylor.  Taylor  introduces  a  single  stretching  parameter  for  shifting 
M  + 1  zero*.  On  the  other  hand,  we  use  these  M  degree*  of  freedom  to  form  the  desired  pattern. 
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examples  by  this  method,  assuming  that  all  we  know  are  the  positions  of  the  nulls,  u,-,  i= 

These  are  given  by  [4]: 

i-Wf  (16) 


where  cosh(x>4)  is  the  sidelobe  ratio,  and  <7,  the  dilatadonal  factor  is  given  by: 


The  procedure  is  to  substitute  (16)  into  (IS)  for  the  solution  of  Fm  and  recover  F(u)  and  g(u) 
from  equations  (13)  and  (12),  respectively. 

The  results  are  plotted  in  figures  2  and  3.  From  the  figures  it  can  be  seen  that  these  results 
agree  with  Taylor's  results.  It  should  be  noted  that  we  have  used  only  the  values  of  u,,  and 
none  of  the  machinery  of  Taylor  to  obtain  a  closed-form  expression.  The  method  will  work  for 
any  prescribed  u< ;  however  the  success  or  accuracy  will  depend  on  how  the  u{  are  spaced.  The 
condition  number  of  the  matrix  involved  in  (IS)  was  of  the  order  one  and  as  can  be  seen,  the 
sidelobes  decay  and  are  not  of  equal  magnitudes. 

We  next  describe  how  to  insert  a  double  null  at  some  point  uk.  To  do  this  we  use  series 
expansion,  form  equation  (IS): 

u 

Fm  *  -a0(ufc)  (18) 

/n -1 

(  2>m(«*  +  A))  Fm  -  -a0(uk  +  A)  (19) 

/n>i 

am(uk  +  A)  ~  am(uk)  +  A  a^(uk)  (20) 

Inserting  equation  (20)  into  (19)  and  using  (18),  we  have: 

!>«(“*)  Fm  “  -*o  (uk)  (21) 

m  ■  l 

We  now  have  two  equations  (18)  and  (21)  and  we  can  delete  two  ut  in  the  neighborhood  of  uk. 
The  results  of  such  an  example  are  shown  in  figures  4  and  5.  As  should  be  noted,  the  pro¬ 
cedure  leads  to  a  disturbance  in  the  sidelobes  in  the  neighborhood.  The  null  is  quite  broad, 
however,  which  may  be  a  good  feature  if  signal  from  a  broad  source  has  to  be  eliminated. 

Case  Z 

Make  the  first  M  sidelobes  all  the  same  size.  The  success  of  Dolpb-Tchebycheff  synthesis 
was  due  to  equal  sidelobe  design.  Taylor's  modification,  the  stretching  of  the  nulls,  to  reduce 
the  singularities  at  the  aperture  edges,  leads  to  unequal  sidelobes.  The  following  procedure  is 
suggested  by  Taylor's  example  shown  in  figure  2.  The  maxima  for  the  sidelobes  occur  approxi¬ 
mately  at  the  half-way  points  in  between  the  nulls.  A  more  exact  analysis  may  not  be  necessary 
because  of  the  iteration  procedure.  Suppose  the  Taylor  nulls  are  at  Iterate  as  fol¬ 

lows: 

Step  1.  Compute  the  following  approximation  to  the  position  of  the  maxima: 

Vi  -  t/2(u<+u,,l).  «■  U-M  (22) 
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Step  2.  Suppose  that  the  sidelobe  level  required  is  k.  Solve  the  following  equation  derived 
from  equation  (13). 

M 

ok  *  F9  ao(yt)  +  £  am(y{)Fm  (23) 

m  ■  1 

where  a*  1  for  i  even,  and  a»-l  for  i  odd. 

Step  3.  Reconstruct  the  radiation  pattern  and  compute  the  new  nulls,  u\,...,u\(. 

Repeat  step  2  with  u}  in  place  of  Uj  and  continue  until  convergence  of  u\n) 


For  the  cases  considered  it  was  found  that  the  convergence  was  extremely  fast.  The  results 
for  the  radiation  pattern  and  the  illumination  function  are  given  in  figure  6  and  7.  Table  1 
gives  some  results  after  about  the  fifth  iteration,  for  design  purposes.  The  diagrams  show  that 
the  beam  is  a  little  sharper  than  that  given  by  Taylor.  This  table  gives  the  location  of  the  first 
zero  for  the  values  of  sidelobe  levels.  It  can  be  seen  that  the  beam  width  is  narrower  since  the 
zeros  have  shifted  to  the  center. 

Case  3: 

We  now  present  general  case.  It  should  be  clear  how  to  generalize.  The  k  near-nulls  give  us 
k  equations  in  a  straightforward  way.  The  other  nulls  will  be  determined  by  the  iteration  pro¬ 
cedure  of  case  2,  but  one  will  only  be  able  to  make  M-k  si  delobes  equal  to  a  prescribed  max¬ 
imum  value.  Figure  8,  9,  10  show  how  regions  can  be  made  very  dose  to  zero  by  placing 
several  consecutive  near  nulls. 


EXTENSION  OF  THE  METHOD  TO  CERTAIN  TWO-DIMENSIONAL  CASES. 

The  Taylor  pattern  for  the  axially  symmetric  problem  has  been  derived  in  [8]  and  [9].  The 
numerical  method  described  above  can  be  easily  extended  to  this  case. 

The  remainder  of  this  section  is  devoted  to  a  two  dimensional  problem  with  a  rectangular 
aperture.  Using  spherical  coordinates  9,  $  and  a  rectangular  illumination  region  of  area 
A=4ab,  consider  the  following  generalization  of  equation  (1): 

ikR 

II(x,y)  =  -V//  mn)e  '*«**«♦+•*«  "°»dtdr,  (24) 

*  A 

where  x  *  R  sin  9c os  ^  ,  y  »  R  sin  flsin  We  assume  that  the  current  distribution  can  be 
represented  in  the  product  form  as: 

I(x,y)  -  A(r)  I2(y)  (25) 

so  that  irifX-y)  must  also  be  of  product  form.  Using  the  same  symmetrical  case  as  before,  we 
have: 


n  =*  ~—f  /i(0  cos(~)(  dff  /;(n)  cos( )7  drt 


(26) 


Let  »  aa,  rri  «  b0,  2a  j  X  ■  .V  /  2,  2b  j  \  *  M  /  2.  Equation  (26)  becomes: 

ikR  *  * 

ri(x,y)  *  ~j/  1 1  (a)  cos (ua)daf  J2(0)  cas(y0)d0 

Urn  0 


(27) 


R  *-  0 

where  u*N/2  sin  9  cos  ^  ,  v*M/2  sin  9  sin  1 t.  The  above  equation  (27)  compares  with 
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equation  (7)  and  we  use  the  same  procedure  to  solve  this  equation.  Some  of  the  results  of  the 
iteration  procedure  for  the  two  dimensional  cases  are  illustrated  in  figure  11  and  12. 


As  mentioned  in  the  introduction  array  thinning  and  subarraying  will  require  the  discretiza¬ 
tion  of  the  illumination  function.  For  adaptive  beamforming,  of  a  very  large  array,  the  numeri¬ 
cal  computation  of  each  element  may  become  prohibitive  and  sub-arraying  may  become  neces¬ 
sary.  A  simple  method  called  product  integration  can  be  illustrated  as  follows.  Consider  the 
space  factor,  /  (u ),  with  illumination  function  i  (x)  as: 

n 

m  =*  /**“«•(■*)<&  (28) 

•ir 

For  discretization,  approximate  i(x)  by  a  constant  Ir  in  the  t*  interval.  Equation  (28)  then 
becomes: 

R  *  *  uvt  R 

/(«)  -  £  /  Ireikxudu  =  £/,/,(«)  (29) 

r  m  lx,  -i/tdr  /••l 


where 

/,<„)  .  2.“*  (30) 

Using  special  values  of  Ir  it  can  be  easily  shown  that  the  above  method  gives  some  well 
known  solutions  (see  for  example  [10]).  Tn  our  case  we  use  the  solution  to  the  illumination 
function  given  by  equation  (11)  for  the  above  procedure. 


CONCLUSION 

To  the  best  of  our  knowledge  the  method  given  in  this  paper  has  not  been  investigated  in 
the  literature.  The  literature  of  radiation  pattern  synthesis  is  extensive.  A  partial  review  of 
various  methods  can  be  found  in  a  recent  reference  [11],  where  a  least  square  approximation 
has  been  investigated.  The  method  given  in  our  paper,  due  to  its  simplicity,  will  have  a  wide 
application  in  large  array  synthesis  as  well  as  for  adaptive  beamforming  problems. 


APPENDIX  A 

T.T.  Taylor  obtained  the  asymptotic  behavior  of  zeros  from  a  rigorous  discussion  of  the 
behavior  of  an  integral  of  the  following  form  for  large  t 

n 

F(t)  *  /((x2  -  u2)ag{u)  cos (f  u))  du  (Al) 

0 

where  g(u)  is  a  perfectly  smooth  function.  His  result  can  be  obtained  by  evaluating  the  above 
integral  exactly  for  g  (u )  =  1  and  then  taking  the  asymptotic  value  of  the  integral.  The  point  is 
that  only  the  behavior  at  u  is  important  in  determining  the  distribution  of  zeros. 


Lemma :  If  we  take  :  f(u)=*  1  then  the  pattern  funcfroo  F(t)  has  the  asymptotic  form: 
lim  F(z)=  cos(*  (r-i/2(l +<*)))  (A2) 

,-na  Tt  Z 
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for  real(z)  >  0. 
Proof:  Consider 


ir 

j  cos(u  tXi^-u^du  *F(t) 

0 

Using  Poisson’s  integral  formula  [12] : 

Jv(x)  *  ;=£■/  ^ —  /  cos(rcoso)sin(aZyya 
v*T(v  + 1/2)  o 

we  have 

,  r(l  +  a V +  °  r 

Using  the  one  term  asymptotic  formula  we  get: 

.(.Vi- ■  4"^  +  ^ 

Then  we  have : 

F{z)mrj±±°Ul  cos^z-t/ad+a))) 

%  2 

Equation  (A7)  shows  the  asymptotic  zeros. 
APPENDIX  B 


(A3) 


(A4) 


(A5) 


(A6) 


(AT) 


The  integral  equation  for  the  determination  of  the  illumination  function  can  also  be  derived 
from  an  energy  point  of  view  using  Poynting  vectors.  Energy  methods  are  useful  for  incor¬ 
porating  effects  due  to  various  design  requirements  for  large  array  synthesis.  In  this  appendix 
we  derive  the  form  of  the  integral  equation  that  would  arise  due  to  energy  methods  directly 
from  the  one  given  in  this  paper. 

a 

S(y)  *  k  JV  *  *'*  g(x)  dx  v  -  cos(j),  -a  <  x  <  a  (Bl) 


Multiplying  (Bl)  with  e'*'4  and  integrating  with  respect  to  v  from  -1  to  1  and  changing  the 
order  of  integration  we  get: 

l/tfi (0  -  /  g(x)^X-'^dx  (B2) 

where 

t 

Frit)  »  /  S(i/)  t’ikv,dv 
•i 

Even  though  (B2)  has  a  symmetric  kernel,  having  quite  a  different  form  from  the  one  in  equa¬ 
tion  (7),  the  eigenvalues  are  the  same,  as  discussed  by  Slepian  and  Pollack  (13).  The  method 
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of  synthesis  using  the  analysis  of  reference  [13]  are  extensively  analysed  in  [14], 
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Rgw*  3.  nhiminatioo  foactioo  lot  the  Tayior  problem. 


Figure  9.  nomination  patten  for  figured. 
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Figure  lOt  Iterated  space  factor  for  two  aear  equal 


Figure  12. 


Figure  12.  Two  dimensional  space  facto*  in  plane 


OPTIMIZED  TAYLOR  ZEROES  (Only  1st  Shown) 


dB  (TAYLOR) 


7 

10 

20 

40 

5 

41  sst 

(.8429?) 

1.13733 

(1.16962)' 

— 

6 

.80656 

(.32983) 

1.12748 

(1.15659) 

1.8133 

(1.8347) 

7 

.80021 

(.82047) 

1.12016 

(1.14651) 

S 

.79551 

(.81345) 

1.114537 

(1.138582) 

1.8058 

(1.8306) 

9 

.79187 

(.80800) 

1.110093 

(1.132203) 

1.8020 

(L8269) 

10 

.788993 

(.803647) 

1.1065 

(1.1270) 

1.7986 

(1.8231) 

Table  L  Values  of  first  zeros  obtained  by  numerical  analysis 
with  uniform  sideiobes  and  compared  with  those  of  Taylor  zeros. 
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Abstract 

A  method  for  computing  leading  eigenvalues  (having  the  largest  real  part)  and  their 
eigenvectors  for  large  generalized  eigenvalue  problems  is  presented.  A  Linear  fractional 
transformation  is  used  to  map  a  group  of  leading  eigenvalues  into  dominant  eigenvalues 
(having  the  largest  modulus).  The  Dominant  eigenvalues  of  the  transformed  problem  are 
computed  by  Stewart’s  (1976)  Simultaneous  Iteration.  Each  iteration  involves  matrix- 
vector  multiplication  and  the  solution  of  a  linear  system,  which  can  be  done  efficiently  if  the 
matrices  involved  are  sparse  or  have  some  special  structure.  Convergence  properties  are 
similar  to  those  of  the  inverse  power  iteration:  the  method  requires  an  estimate  for  the 
region  in  the  complex  plane  containing  the  desired  eigenvalues,  and  converges  rapidly 
when  a  good  estimate  is  available.  The  amount  of  work  is  also  comparable  to  that  of  the 
basic  inverse  iteration,  which  is  significantly  less  than  that  required  for  full  eigensolution. 
Examples  from  hydrodynamic  stability  demonstrate  convergence  rates,  computation  time 
and  the  ability  to  resolve  simultaneously  groups  of  leading  eigenvalues. 

i.  imcttdustign 

A  generalized  eigenvalue  problem  has  the  form: 

A  x  =  a  B  x  (1.1) 

where  A,  B  e  <D"  xn  are  general  complex  matrices.  In  many  applications  these  matrices 
will  have  some  useful  structure,  such  as  symmetry  or  sparsity. 

Let  the  Leading  Eigenvalues  of  (1. 1)  be  those  having  the  largest  real  part  ;  the  more 
common  term.  Dominant  Eigenvalues,  refers  to  those  having  the  largest  modulus  .  In 
some  applications,  only  a  few  leading  eigenvalues  of  (1.1)  are  sought;  for  example,  in 
linear  stability  problems,  the  real  part  of  a  is  the  growth  rate,  and  the  eigenvectors  of  the 
leading  eigenvalues  represent  the  most  unstable  modes. 

Traditional  methods  for  solving  (1.1)  usually  involve  finding  all  the  eigenvalues,  using 
the  Q-Z  algorithm  (see  IMSL  or  other  numerical  analysis  libraries)  and  then  sorting  by  the 
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real  part  This  involves  0(n  3)  work,  where  n  is  the  order  of  the  matrices,  and  becomes 
expensive  or  impractical  for  large  n ;  little  or  no  advantage  can  be  taken  of  sparsity  or  other 
structure  of  A  and  B. 

Several  methods  exist  for  extracting  selected  eigenvalues  and  eigenvectors  of  standard 
eigenvalue  problems,  i.e.  when  B  is  invertible  (see,  for  example,  Golub  and  Van  Loan, 
1983.)  Power  and  Lanczos  methods  compute  the  dominant  eigenvalues;  inverse  iteration 
can  find  the  eigenvalues  closest  to  a  given  point  in  the  complex  plane  and  their 
eigenvectors.  These  are  not  directly  applicable  to  the  problem  of  computing  the  leading 
eigenvalues. 

Recently,  an  integration  method  was  proposed  (Goldhirsch  et  al.  1987)  for  the  leading 
eigenvalues  of  a  standard  eigenvalue  problem.  This  method  is  simple  and  elegant;  however, 
its  convergence  may  become  very  slow  (or,  alternatively,  the  size  of  the  reduced  problem 
may  become  very  large)  if  the  separation  of  the  eigenvalues  is  small.  Another  problem  may 
arise  if  the  problem  is  defective,  i.e.  a  leading  eigenvalue  has  generalized  eigenvectors;  in 
this  case,  the  integration  method  may  return  inconclusive  or  inconsistent  results. 

2.  The  Dominance  Mapping  Method 

This  method  attempts  to  address  the  problems  of  the  form  (1.1)  which  are  not  solved 
efficiently  by  the  other  methods  mentioned  above.  It  will  work  for  singular  A  and  B;  for 
defective  problems;  it  will  take  full  advantage  of  the  structure  of  the  matrices;  and  it  allows 
some  control  over  convergence  rates.  There  are  a  few  restrictions,  however,  which  will  be 
discussed  below. 

The  eigenvalues  in  the  complex  a-plane  can  be  mapped  to  a  A.-plane  by  the  linear 
fractional  transformation: 

a  =  P*aLL  «•!> 

X+l 

where  a  is  a  real  positive,  and  (3  a  complex,  constant.  The  important  effect  of  this  linear 
fractional  mapping  is  to  map  the  half-plane  to  the  left  of  cr=(3  to  the  inside  of  the  unit  circle 
in  the  X  plane,  as  seen  in  fig.  1.  If  m  leading  eigenvalues  are  required,  and  we  select  (3 
such  that: 


Re(Oj) 


>  Re((3)  i=l. ..m 
<  Re((3)  i=m  +l...n 
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then  the  corresponding  m  eigenvalues  will  be  dominant  in  the  X  plane: 


Figure  1:  The  Dominance  Mapping  (2.1) 

The  eigenvalue  problem  for  X.  is  of  standard  form: 

C  u  s  Cf1  C2  u  =  X  u  (2.2) 

where: 

Ci  *  -  [A-(a  +J3 )  B] 

C  2=  [A+(a-{3)B] 

The  problem  of  computing  the  leading  eigenvalues  of  (1.1)  becomes  that  of  computing 
the  dominant  eigenvalues  of  (2.2);  the  methods  mentioned  in  §1  can  now  be  applied.  We 
used  Stewart’s  (1976)  versio"  of  Simultaneous  Iteration,  which  finds  dominant  invariant 
spaces  of  a  general,  non-hermitian,  possibly  defective  C. 

A  transformation  similar  to  (2.1)  was  proposed  by  Jennings  (1977),  in  the  context  of 
converting  a  quadratic  eigenvalue  problem  to  standard  form.  Jennings  (and  no  one  else,  to 
the  best  of  the  author’s  knowledge)  has  not  made  the  second  step  of  applying  a  dominant 
eigenvalue  method  to  a  transformed  problem  equivalent  to  (2.2). 

The  mapping  constants  a  and  (3  allow  the  user  some  control  over  the  rate  of 
convergence  and  the  order  in  which  the  leading  eigenvalues  emerge  during  the  iteration. 
The  user  must  have  an  estimate  of  where  in  the  complex  plane  the  leading  eigenvalues 
reside;  (3  is  set  to  the  left  of  this  region.  The  point  c  =  (3+a  is  a  singular  point  of  (2.1) 
which  maps  to  infinity  in  the  X  plane;  eigenvalues  close  to  c  will  map  to  very  large 
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modulus  in  the  A.  plane,  and  will  converge  rapidly  during  the  iteration  of  (2.2).  a  should  be 
set,  therefore,  so  that  c  is  near  the  center  of  the  leading  region  or  near  the  most  important 
eigenvalue. 

The  following  algorithm  computes  m  leading  eigenvalues  and  eigenvectors  of  (1.1), 
using  the  Dominance  Mapping  and  Simultaneous  Iterations: 

1.  estimate  leading  region;  select  a,($ 

2.  perform  L-U  decomposition  of  Ci  =  -  [A-(a  +(5)  B]; 

(use  the  structure  of  A  and  B  ) 

3.  select  m  initial  vectors  e  (D 11X111 

4.  Simultaneous  Iteration  on  C  u  =  X  u  : 
for  each  multiplication  u^+1^  =  C  u^  do: 

4.1  multiply:  v  =  C2 

4.2  solve  the  system:  Ci  u^+1^  =  v 

5.  map  converged  X,  — ►  a* . 


3.  Singularities  in  the  Dominance  Mapping 

The  algorithm  of  §2  may  fail  in  two  cases,  corresponding  to  the  two  singularities  of  the 
mapping  (2.1):  the  point  a=c  ,  which  maps  to  infinity  in  X,  and  X=-l  which  maps  to 
infinity  in  a. 

When  I c  -Oj  I  <  e  for  some  i  <m,  for  a  small  (machine-dependent)  e,  then  the  matrix  C 
will  be  ill-conditioned  or  numerically  singular.  This  is  easily  remedied  by  a  small  change  in 
a,  which  does  not  significantly  affect  any  other  properties  of  the  mapping. 

When  llm(Oj  -  c )  I  »  1  for  some  i  <m,  the  corresponding  X-eigenvalue  is  close  to  the 
singular  point  X=-l.  This  implies  that  its  separation  from  the  subdominant  eigenvalues 
inside  the  unit  circle  is  small,  often  so  small  that  convergence  is  impractical.  Some 
improvement  may  result  if  we  increase  a;  but  this  may  decrease  the  modulus  of  other 
dominant  X-eigenvalues  and  slow  down  their  convergence.  In  a  case  where  leading 
eigenvalues  are  widely  separated  in  the  imaginary  direction,  it  may  be  necessary  therefore 
to  restart  the  iteration  with  different  (3  values  to  resolve  separate  clusters  of  leading  a- 
eigenvalues.  An  example  of  this  situation  appears  in  §5  below. 

4.  Example 

The  performance  of  the  DM  method  can  be  demonstrated  by  observing  the  amount  of 
work  needed  to  resolve  a  fixed  subset  of  leading  eigenvalues,  as  the  order  of  the  problem 
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increases.  The  following  example  includes  tridiagonal  matrices  of  increasing  size  n,  all 
having  two  leading  eigenvalues: 

Oi  =  1.2  (4.1) 

o2=  1.1 

Re(Oi)  SI  for  i  =  2  ...n  . 

Selecting  a  =  0.3,  J3  =  1.0  isolates  Oi,  02  -  The  problem  was  solved  first  using  the 
traditional  QZ  routines  (IMSL),  then  using  the  DM  method  but  treating  the  matrices  as 
dense,  and  finally  taking  full  advantage  of  the  structure.  The  results  are  shown  in  Figure  2. 

The  savings  in  computing  time  relative  to  the  full  eigensolution  can  be  significant:  at 
n  =100,  only  y  of  the  work  is  necessary  even  without  exploiting  the  band  structure;  the 

work  is  reduced  by  more  than  an  order  of  magnitude  when  the  structure  is  used. 


time 


Figure  2:  Work  to  resolve  the  two  leading  eigenvalues  (4.1),  using 
three  solution  algorithms 


5.  Application  to  the  Orr-Sommerfeld  Equation 
The  Otr-Sommerfeld  equation: 

(D2-  a2)2  =  iotR  [(U-c)(D2-a2)y  -  U  ]  (5.1) 

describes  the  hydrodynamic  stability  of  parallel  shear  flow  (see,  for  example,  Drazin  and 
Reid  1981.)  High  accuracy  eigenvalues  were  computed  by  Orszag  (1971)  for  plane 
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Poiseuille  flow  with  R= 10000  (Reynolds  number)  and  a=l  (streamwise  wavenumber.) 
The  location  of  the  twelve  least  stable  eigenvalues  are  shown  in  figure  3. 


Figure  3:  Poiseuille  flow  eigenvalues.  R=10000,  a=1.0 


Equation  (5.1)  is  discretized  using  central  differences  (a  spectral  method  may  be  more 
appropriate  in  this  specific  case,  as  in  Orszag  (1971),  but  the  banded  finite-difference 
matrix  is  a  good  example  of  candidate  problems  for  the  DM  method.)  The  eigenvalue  c  is 
replaced  by  cx=-i c  ,  to  conform  with  the  definitions  of  (1.1). 

When  Im(3)-Im(ai)  (the  upper  dashed  lines  in  fig.  3),  eigenvalues  1  and  4  were  the 
first  to  converge;  2,  3, 5  and  6  took  longer  to  converge,  since  the  imaginary  part  separation 
brought  their  X  counterparts  close  to  the  singular  point  A.=-l.  For  Im(p)«Im(<72)  (the  lower 
dashed  lines  in  fig.  3),  the  order  was  reversed:  first  eigenvalues  2,  3,  5  and  6  and  then  1 
and  4.  In  both  cases,  the  first  group  converged  within  10  to  15  iterations,  regardless  of  the 
number  of  grid  points;  the  second  group  took  much  longer  to  converge. 

The  error  associated  with  convergence  of  the  ^-iteration  was  not  significant  in  our 
computations.  Using  a  stopping  criterion  of  IlCu-Xuii  £  10'3,  the  leading  a-eigenvalues 
were  converged  to  at  least  5  digits.  The  discretization  error  of  the  finite-differencing 
(compared  to  Orszag’s  results)  is  proportional  to  Ax2,  as  expected-  The  time  to  resolve  the 
most  unstable  eigenvalue  and  its  discretization  error  vs.  the  grid  resolution  are  plotted  in 
figure  4. 
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ABSTRACT  In  the  first  part  of  this  paper  we  develop  some  results 
for  the  recurring  sequence 

B(n)  »  a  B(n-1)  +  b  B(n-2), 

with  a  and  b  *  0,  B(0)  =  1,  and  B(1)  =  1, 

and  show  a  relationship  between  this  sequence  and  the  simple  network  of 
resistors  known  as  a  ladder  network.  Then,  using  certain  values  for  the 
coefficients  a  and  b,  we  show  tiling  of  the  plane  using  this  general 
recurring  sequence. 

In  the  second  part  of  the  paper,  using  the  Fibonacci  recurring 
sequence  and  Fibonacci  polynomials,  we  investigate  the  paths  of  light  rays 
incident  upon  two  stacked  glass  plates.  We  model  the  number  of  distinct 
paths  of  light  rays,  number  of  reflections  of  light  rays,  and  number  of 
crossings  of  the  interface  between  the  glass  plates  using  both 
homogeneous  and  nonhomogeneous  recurrence  relations.  Once  again,  we  are 
able  to  tile  the  plane  using  these  sequences. 

i.  iniLQduslism 

The  Fibonacci  numbers  are  a  sequence  of  real  numbers  in  which  each 
successive  value  in  the  sequence  is  defined  to  be  the  sum  of  the  previous 
two  values.  The  first  Fibonacci  number  is  zero  and  the  second  is  1 .  The 
recurring  sequence  of  Fibonacci  numbers,  then,  can  be  described  as 

F(n)  =  F(n-1)  +  F(n-2)  with  F(0)  =  0  and  F(1)  =  1.  (1) 

% 

The  Fibonacci  sequence  exhibits  many  interesting  mathematical 
properties.  Some  of  these  are: 

i)  Every  third  Fibonacci  number  is  even, 
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ii)  No  two  consecutive  Fibonacci  numbers  have  a  common  factor, 

iii)  The  sum  of  any  ten  consecutive  Fibonacci  numbers  is  always 

divisible  by  11  [11]. 

Over  the  years,  many  physical  and  natural  phenomena  have  been 
identified  to  exhibit  properties  which  can  be  modeied  with  the  Fibonacci 
sequence.  One  of  the  classic  applications  is  a  population  model.  Assume 
that  a  pair  of  a  species  is  isolated  and  allowed  to  breed.  Assume  further 
that  for  each  time  period  the  original  pair  gives  birth  to  a  new  pair.  Each 
new  pair  matures  and  two  time  periods  after  birth  gives  birth  to  yet 
another  new  pair.  If  we  assume  no  deaths,  the  number  of  pairs  of  the 
species  at  the  beginning  of  each  time  period  is  a  Fibonacci  number. 

Plant  life  exhibits  many  properties  which  can  be  modeled  by  Fibonacci 
numbers.  The  number  of  petals  in  the  flowers  of  many  plants  are  Fibonacci 
numbers.  The  number  of  spirals  in  the  scale  patterns  of  pine  cones  is 
usually  a  Fibonacci  number  [12]. 


The  applications  of  the  Fibonacci  sequence  are  extensive  in  many 
diverse  fields  of  mathematics.  One  area  which  is  especially  rich  in  the  use 
of  Fibonacci  numbers  is  number  theory.  Fibonacci  numbers  are  widely  used 
in  solutions  to  the  problem  of  Diophantus  [6,8].  Fibonacci  numbers  appear  in 
the  analysis  of  Aitken  Acceleration,  a  numerical  analysis  procedure  which 
speeds  up  the  convergence  of  some  sequences  [13]. 

One  of  the  better  known  applications  of  Fibonacci  numbers  is  in  the 
field  of  optimization  of  functions  of  a  single  variable.  One  very  popular 
method  for  resolving  the  associated  line  search  problem  is  the  Fibonacci 
search  method.  Assuming  only  that  a  function  g(x)  is  unimodal  on  an 
interval,  the  Fibonacci  search  is  a  method  to  successively  select  N  <  » 
measurement  points  so  that  one  can  determine  the  smallest  possible  region 
of  uncertainty  in  which  the  modal  value  must  lie.  If  d-i  is  the  initial  width 
of  the  interval  of  uncertainty,  then  after  k  s  N  measurements 

(  F(N-k+iy| 

\s{  F(N)  Jdi  (2) 
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is  the  width  of  the  interval  where  the  integers;  F(m),  are  the  Fibonacci 
numbers  generated  from  the  recursion  in  (1). 

A  practical  use  of  the  Fibonacci  search  was  reported  by  Braverman 
[10]  to  locate  the  sample  size  that  maximized  the  expected  net  gain  from 
sampling  in  a  Bayesian  decision  problem. 

Some  extensions  of  Fibonacci  numbers  and  Fibonacci  polynomials  are 
related  to  certain  classes  of  discrete  probability  distributions  which  have 
important  applications  in  reliability  theory.  The  Fibonacci  numbers  of 
order-k  are  defined  as 

O')  (M 

00  F  (n-l)  + ...  +F  (I)  2<n<k+l 

“  “  (lc)  (k)  (3) 

w  [F  (n-l)  +  ...  +  Fl  (a-k)  a >  k+2 

with  F(k)(0)  =  0  and  F(k)(l)=l. 


Fibonacci  polynomials  of  order-k  are  likewise  defined  as 


with  Fgk)(x)  =  0  and  F(k)(x)  =  1  . 

The  Fibonacci  polynomials  are  obtained  from  (4)  by  setting  k  =  2.  In  order  to 
obtain  the  Fibonacci  numbers  of  order  k,  set  x  =  1  in  (4). 

Now  the  Fibonacci  polynomials  of  order-k  have  been  generalized  to  a 
Fibonacci-Type  polynomial  of  order-k  [15]  as  follows 
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2  £  n  £  k+1 

# 


n  £  k+2 


(5) 


It  is  polynomials  of  the  form  in  (5)  that  occur  in  certain  discrete 
density  functions  [15].  For  example,  let  X  be  the  random  variable  with  the 

geometric  distribution  of  order-k  and  parameter  p  £  (0,1).  We  have 


P(X  =  n+k)  * 


n  2  0 


(6) 


Likewise,  let  Ln  be  the  random  variable  representing  the  length  of  the 
longest  string  of  numbers  in  n  2  1  Bernoulli  trials.  We  have 

p(L“sk)iB  iTJ  Osksn.  (7) 

The  reliability  of  a  system  may  be  increased  without  duplicating  the 
system  by  using  a  "consecutive-k-out-of-n  :  F  system/  [16].  This  is  a 
system  that  fails  if  and  only  if  1  s  k  s  n  consecutive  components  fail.  This 
system  was  first  introduced  in  connection  with  telecommunications  and  oil 
pipeline  systems. 

One  simple  result  is  cited.  If  the  system  consists  of  n  components 
arrayed  linearly,  operate  independently  and  identically  with  probability  p, 
then  the  probability  of  failure  is  one  minus  the  probability  that  in  n  trials 
there  is  no  occurrence  of  a  string  of  k  failures: 


..  .0+1 

P(F)=l-P(4sk-l)*l-i±^_ 


0  s  k  s  a. 


(8) 
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The  Fibonacci  numbers  and  polynomials  also  occur  in  the  models  and 
their  solutions  in  the  fields  of  engineering  and  applied  physical  science.  In 
this  paper  we  discuss  two  such  applications. 

For  the  first  application  we  use  the  recurring  sequence 

B(n)  =  a  B(n-1)  +  b  B(n-2),  (9) 

with  a  and  b  *  0,  B(0)  =  1,  and  B(1)  =  1, 

to  model  the  simple  network  of  resistors  known  as  a  ladder  network.  Then, 
using  certain  values  for  the  coefficients  a  and  b,  we  show  a  tiling  of  the 
plane  using  this  general  recurring  sequence. 


For  the  second  application  we  use  a  recurring  sequence  similar  to  (1) 
and  Fibonacci  polynomials  to  investigate  the  paths  of  light  rays  incident 
upon  two  stacked  glass  plates.  We  model  the  number  of  distinct  paths  of 
light  rays,  number  of  reflections  of  light  rays,  and  number  of  crossings  of 
the  interface  between  the  glass  plates  using  homogeneous  and 
nonhomogeneous  recurrence  relations.  Once  again,  we  are  able  to  tile  the 
plane  using  these  sequences  to  demonstrate  a  geometric  interpretation. 


The  ladder-network  of  resistors  shown  in  Figure  1  is  an  important  network 
in  communication  systems  [2,9].  The  following  definitions  are  provided: 

a)  The  resistance  through  the  resistor  =  Rj, 

b)  Voltage  across  the  resistor  *  e; , 

c)  The  attenuation  (input  voltage/output  voltage)  =  A, 

d)  the  output  impedance  =  Zq, 

e)  the  input  impedance  =  z-\ . 


Ri  Ri  Ri  Ri 


Figure  1:  Schematic  showing  location  of  resistors  in  a  ladder  network. 

A  model  of  ladder-networks  obtained  from  Kirchhoff’s  &  Ohm's  Laws 
was  developed  in  [2,9,14  ].  The  model  for  the  attenuation  is  the  generalized 
Fibonacci  sequence 

Bn  =  ki  Bn.i  +k2Bn.2  with  B0  =  1  and  Bi=k-|.  (10) 

The  coefficients  k-j  and  k2  depend  on  resistances  R-(  and  R2. 

In  this  model,  B2n  is  the  attenuation  of  the  circuit  with  n  pairs  of  Ri 
and  R2  resistors. 

First,  we  develop  a  useful  result  for  solving  our  model. 
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The  following  relationship  always  holds  for  (10). 


=  (-1)“  kj. 

where 

B0  *  1 ,  Bx  =  1^  ,  B2  =  k^  +  kj , ... 


(cf[2,5]). 


(ID 


Plflfll 

Another  way  of  stating  (10)  is 


1 

l-kjj-kj*2) 


=  BQ  +  Br  x  +  B2  x2  +  .... 


(12) 


where  x2-^-^  is  the  auxiliary  polynomial  of  the  2nd  order 
homogeneous  difference  equation  and  therefore 


l2*klI  +  k2' 


(13) 


Now,  multiplying  (13)  through  by  x,  we  have 
x3  =  k1  x2  +  x 

and  replacing  x2  with  the  values  in  (13)  leads  to 
x3  =  [  ^  +  I4  J  x  +  or  x3  =  B2  x  +  k^  . 


(14) 


Then,  continuing  in  the  same  way  that  we  obtained  (14),  we  have 
xn+1  *  B„  x  +  K  (where  K  is  some  constant).  (15) 

Solving  for  (13),  we  have  the  following  two  roots: 
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(16a) 


(16b) 


In  (15),  we  write 
x”*1  =  Bn  Xj  +  K 


and 

X2a+1  =  Bn  ^  +  K 


Subtracting  (17b)  from  (17a)  and  combining  terms,  we  have 

n“  (v**)  ‘ 


(17a) 


(17b) 


(18) 


Substitution  into  (11)  completes  the  proof  for  the  cases  x-(  0x2- 

When  Xi  =  X2,  then  1^  =  -^-.  For  this  case  we  find  Bn  by  inspection  to  be 


Substitution  of  (19)  into  the  left-hand  side  of  (11)  yields  (^J 
and  completes  the  proof  for  that  case. 
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Now,  using  (16a),  (16b),  and  (18)  or  (19)  the  attenuation  for  any  ladder- 
network  circuit  like  that  in  Figure  1  can  be  explicitly  determined. 


The  result  in  (11)  for  the  generalized  Fibonacci  formula  of  (10)  can  be 
extended  to  higher-order  recursive  equations. 

We  now  consider  the  generalized  Tribonacci  formula 

T°  =  *1  Tn-l  +  *2  Tn-2  +  *3  Ta-3  (OC\\ 


where  ki ,  k2,  and  k3  are  arbitrary  constants, 

T0  «  1 ,  Tx  -  kj ,  T2  -  kj  +  kj  ,  and  T3  a  kj  +  2  kj  I4  +  kj  . 

....  (  eft  5]). 


Using  the  same  methods  that  we  used  to  find  (17a)  and  (17b)  it  is  not 


difficult  to  show  that 

x^+2  »  Tn  x2  +  a  xl  +  b 

(21a) 

x2n+2  =  Tn  x22  +  a  x2  +  b 

(21b) 

x3n+2  =  Tn  x32  +  a  x3  +  b 

(21  c) 

where  x1  ,  x2  ,  and  x3  are  the  three  distinct  roots  of  the  auxiliary  equation 
belonging  to  (20)  and  a  and  b’  are  constants. 


Now,  subtracting  (21b)  from  (21a)  and  (21c)  from  (21a)  ,  we  are  left  with 
two  equations.  Solving  the  two  equations  we  get: 

T  _E  G 

°“F’H  (22) 


where 


(23a) 
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F  =  (  Xj  -  *3  *  *2  ) 

I 

(23b) 

o.(*r2-*r2) 

i 

(23C) 

H  =  (x1-x2)(x3-x2)  . 

(23d) 

The  following  solutions  for  the  three  roots  of  the  auxiliary  polynomial 

+  (24) 

of  the  third  order  homogeneous  difference  equation  in  (19)  are  found  to  be: 
Xj  *  A  +  B  ,  (25a) 


*2  = 


(25b) 


*3  = 


(25c) 


with 
A  = 


(26a) 
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(26b) 


C*27(2k?-9kik*  +  27k>)  ,  (26c) 

Day(3k2*^  ■  (26d) 

where 


i  =  v^T  ,  and  n  =  0,  1,  2, 


Therefore,  an  explicit  formula  can  be  obtained  for  the  generalized 
Tribonacci  equation  using  (26),  (25),  (23),  and  (22).  Note  when  kj  <*  1  for 
i  *  1,  2,  3,  the  value  of  (23)  is  the  nth  Tribonacci  number. 

Higher-order  generalized  formulas  can  also  be  solved  explicitly  when 
the  roots  of  the  equations  (  like  (13)  and  (24)  )  are  solved. 


In  order  to  demonstrate  the  geometry  of  this  model  in  (10)  with  the 
simple  values  of  ki  =  k2  =  1,  the  values  of  Bn  can  be  used  to  tile  the  plane 
by  he  arrangement  in  Figure  2.  [3].  Note  that  each  tile  is  a  square.  This 
method  of  arrangement  is  simple  to  follow.  In  each  case,  as  each  new  tile  is 
adc  ed  to  the  region,  a  full  rectangle  results. 


t  1 


2 


5 


3 


8 


Figure  2  :  Simple  tiling  of  the  plane  using  the  Fibonacci  sequence. 


3.  Paths  gf  Light  Rays 

3.1  Number  of  Distinct  Paths  with  n  Reflections 

The  model  for  the  number  of  distinct  paths  for  light  rays  incident 
upon  two  stacked  glass  plates  which  have  n  reflections  is  the  familiar 
Fibonacci  sequence  [7].  Several  such  possibilities  of  reflections  are  shown 
in  Figure  3. 
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Figure  3:  Portrayal  of  the  possible 
scenarios  with  the  given  number  of  reflections 


This  model  can  be  easily  explained.  Let  Pn  be  the  number  of  paths  having  n 
reflections.  Clearly  P0  =  1  and  Pi  =  2. 


Now,  assume  bundle  A  has  Pn.i  rays  each  with  exactly  n-1  reflections 
while  bundle  B  has  Pn  rays  with  exactly  n  reflections  as  shown  in  Figure  3 
for  n  =  0,  1,  2,  3.  There  is  no  loss  of  generality  regarding  the  parity  of  n  as 
the  following  argument  is  valid  if  the  diagram  in  Figure  4  is  turned  over  top 
to  bottom. 


Figure  4:  Portrayal  of  two  bundles  of  rays  with  n-1  and  n  reflections 


Pn+i  is  the  number  of  distinct  paths  yielding  n+1  reflections.  These 
all  must  come  out  of  the  glass  plates  in  the  opposite  direction  to  those  in 
bundle  B  and  in  the  same  direction  as  those  in  bundle  A.  Thus  the  rays  of 
bundle  B  are  reflected  at  their  exit  surface  to  get  bundle  B1  while  those  of 
bundle  A  are  reflected  at  their  exit  surface  and  then  again  at  the  interface 
in  order  to  exit  in  the  same  direction  to  get  bundle  A'. 


Figure  5:  New  reflections  needed  from  bundles 
A  and  B  to  achieve  n+1  reflections 

Since  A  contained  all  the  distinct  paths  with  n-1  reflections  and  B 
contained  all  distinct  paths  with  n  reflections,  the  total  number  of  paths 
with  n+1  reflections  is  determined  by 

Pn+1  s  Pn  +  Pn-1  • 

They  are  all  distinct  since  each  ray  of  bundle  A'  has  a  last  reflection  from 
the  interface  while  each  ray  of  bundle  B'  has  a  last  reflection  from  an 
outside  surface.  Since  the  rays  of  A'  and  B'  are  each  distinct  within  their 
bundles  and  are  now  distinct  from  paths  in  the  other  bundle,  it  follows  that 
there  is  no  duplication  in  bundle  A'  +  B*. 

3.2  The  Crossing  Numbers 

Suppose  as  a  further  investigation  we  ask  ho.v  many  times,  Cn  ,  did 
the  rays  in  the  bundle  with  n  reflections  cross  the  interface  between  the 
two  plates.  Clearly,  from  Figure  3,  Cq  =  1,  C-j  =  2,  C2  =  5 .  From  the 
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mechanics  of  propagation,  shown  in  Figure  5,  the  crossings  by  bundle  A  is 
Cn.i  and  the  crossings  by  bundle  B  is  Cn.  Bundle  A'  has  the  same  number  of 
crossings  as  bundle  A,  while  bundle  B1  has  all  the  crossings  of  B  plus  one 
extra  crossing  for  each  path  in  bundle  B.  Thus 

<27> 


which  can  also  be  used  to  tile  the  plane.  Using  the  method  from  [4]  to  tile 
this  sequence,  the  resulting  tiling  for  the  crossing  numbers  is  shown  in 
Figure  6. 
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NOT  TO 
SCALE 


13 


34 


Figure  6:  Tiling  the  plane  using  the  values  of  Cn  and  Fn. 


Under  this  construction,  the  tiling  sequence  starts  with  C0=*1,  Ci=2, 
and  F3=2  and  then  continues  in  the  manner  shown  in  Figure  6. 

Filler  rectangles  used  to  fill  in  the  gaps  left  after  placing  the  square 
tiles  can  also  be  determined.  We  redraw  the  tiling  in  Figure  6  using 
sequential  notation  rather  than  actual  numbers  to  show  these  gap  filling 
rectangles.  We  use  the  notation  H  and  V  to  denote  these  filler  rectangles 
that  appear  to  be  oriented  horizontally  and  vertically,  respectively. 
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NOT  TO  SCALE 


Figure  7:  Tiling  of  the  plane  showing  filler  rectangles. 
One  can  see  the  following  rectangle  sizes  from  Figure  7  above: 

Hq ,2  *s  (Qj  *  Fs)  bY  Fs 

Hi  2  is  (C5  -  F7)  by  F7. 

H2,2  is  (C7  -  Fg)  by  Fg. 

Vo, 2  is  F4  by  (C2  -  F4) 

vi,2  is  F6  by  (C4  -  Fs). 

V2i2  is  Fa  by  (Cg  -  Fa). 

^3,2  is  F10  by  (Cg  -  F10). 

In  general, 

Hl,n  is  (C2n+3  *  F2n+S  )  by  F2n  +  5 
and 

Vi.n  is  F2n+4  by  (  C2n  +  2  *  F2n+4)- 
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3.3  Number  of  Reflections  at  the  Interface 

The  model  for  the  number  of  reflections  at  the  interface,  In,  is  also  easily 
formed  from  a  recursive  sequence.  Clearly,  from  Figure  3,  I0  »  0, 1-j  =1, 

I2  =  2.  In  this  case,  this  number  of  reflections  is  determined  by 

4  =  ^-1  +  la-2  +  Fn  >  with  I0=  0  “d  h=L 
This  number  pattern  is  generated  from 

X 

and  the  tiling  using  In  is  shown  in  Figure  3. 


Figure  8:  Tiling  of  the  plane  using  values  for  In 
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3.4  Number  of  Reflections  at  the  Upper  Surface 

Many  other  reflection  counts  for  this  problem  can  be  modeled  with  the 
recursive  sequences  and  the  plane  tiled  in  similar  ways.  Our  last  example  is 
the  number  of  reflections  at  the  upper  surface,  Un.  Here,  the  recurrence  is 
mixed  for  odd  and  even  numbers  of  n  and  is  written  as 


U,  =  U,  ,  +  u,  ,  +  F,  , 

2n  2n-l  2n-2  2n+l 


^2n+l  =  ^2n  +  ^2n-l  +  ^2n+l 


with  u0  =  0  ,  ux  =  0. 


The  tiling  of  the  plane  using  the  Un  is  shown  in  Figure  9. 


Figure  9:  Tiling  of  the  plane  using  values  for  Un. 
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4.  Conclusion 


While  the  Fibonacci-related  sequences  have  been  used  to  model 
numerous  applications,  their  use  in  engineering  and  science  has  been 
limited.  We  have  tried  to  show  a  few  of  the  types  of  applications  from 
science  and  engineering  where  their  use  is  appropriate.  We  have  also  used 
the  technique  of  tiling  to  provide  a  geometric  flavor  to  these  sequences  and 
apprcations. 
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APPROXIMATION  AND  INTERPOLATION  FORMULAS 
FOR  REAL-TIME  APPLICATIONS 
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and 
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ABSTRACT.  A  general  scheme  for  constructing  a  compactly  supported  function  that 
only  requires  finite  (and  relatively  small)  storage  for  the  purpose  of  processing  gridded 
discrete  data  in  (near)  real-time  is  presented.  The  attractive  features  are  incoming  data 
are  used  directly  as  filtering  coefficients  without  matrix  inversion  and  the  optimal  order  of 
approximation  is  achieved  while  the  data  are  being  interpolated. 

1.  INTRODUCTION.  Recently,  there  has  been  much  interest  in  the  problem  of  con¬ 
structing  univariate  and  multivariate  approximation  schemes  by  utilizing  a  single  function 
4>.  The  main  ingredient  in  such  construction  processes  consists  of  dilation  (which  we  will 
also  call  scaling)  and  translation  of  the  function  <f>.  Problems  such  as  spline  approxima¬ 
tion  and  interpolation,  realization  of  neural  nodes  in  neural  network  structural  analysis, 
synthesis  via  wavelets,  and  representation  of  surfaces  by  radial  basis  functions  all  fall  into 
this  category.  In  this  paper,  we  are  concerned  with  the  problem  of  the  construction  of 
(near)  real-time  approximation  and  interpolation  formulas  by  using  <f>.  More  precisely,  a 
compactly  supported  function  rp  that  can  be  evaluated  at  any  time  instant  and  space  po¬ 
sition  efficiently  will  be  constructed  from  scaling  and  translation  of  <t>,  such  that  incoming 
discrete  data  samples  can  be  used  readily  together  with  translates  of  xp  to  give  complete 
analog  information  with  a  minimal  number  of  multiplications  and  additions,  and  that  the 
representation  guarantees  optimal  order  of  approximation  provided  by  6.  Since  we  seek  a 
compactly  supported  0,  we  must  start  with  a  compactly  supported  function  <?  which  will 
be  assumed  to  be  piecewise  continuous  for  the  sake  of  convenience  and  feasibility  in  appli¬ 
cations.  Typically,  in  one  variable,  <f>  is  a  R-spline,  and  in  the  multidimensional  setting, 
0  may  be  chosen  as  an  appropriate  linear  combination  of  box  splines  in  order  to  achieve 
the  highest  order  of  approximation  and  computational  efficiency.  By  setting  <?  to  be  the 
average  of  and  <p(—x),  if  necessary,  we  may  always  assume  that  <p  is  symmetric  with 
respect  to  the  origin. 

We  will  call  n  =  n(<p)  6  Z+  the  local  order  of  approximation  of  <6  if  it  is  the  largest 

1  Supported  by  3DIO/IST  managed  by  the  U.S.  Army  Research  Office  under  Contract 
No.  DAAL03-S7-K-0025 

2  Supported  by  DARPA  under  Contract  No.  MDA  972-SS-C-0047 
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integer  such  that 


Here,  C”  =  C"(R4)  denotes  the  class  of  all  compactly  supported  n  times  continuously 
differentiable  functions  in  s  variables.  We  are  given  a  set  of  discrete  data  {/,},  i  6  I,  I  C  Z4 
(e.g.  fi  =  fiih)  or  some  partial  derivatives,  etc.,  of  /  at  ih,  f  €  Cq*(R4)).  The  objective  of 
this  paper  is  to  derive  a  real-time  approximation-interpolation  scheme  to  yield 

*fc(/)  =  ~ih)) 

»'€  Z* 

that  satisfies: 

(i)  11/  -  sh(f) ||  =  0(V),  and 

(ii)  («k(/)  -  /)(»**)  =  0,  «  €  /, 

(or  partial  derivative  versions,  etc.)  for  all  /  6  Cq{ R4). 

2.  RESULTS.  For  convenience,  we  only  state  results  on  interpolation  of  function 
values.  Our  approach  is  to  construct  i>  which  has  compact  support  and  can  be  easily 
“stored” ,  such  that 

(a)  4>(ih)  =  <5,o,  and 

(b)  f  -  Z  -  ih))  *  0(hn) 

i€/ 

for  all  /  6  C(J*(R4);  consequently,  the  above  objectives  (i)  and  (ii)  are  achieved.  The  index 
set  I  is  assumed  to  contain  0  and  to  be  homogeneous,  in  the  sense  that  for  any  i  €  /,  we 
have  I  —  i  =  I.  In  this  regard  we  note  that  the  following  are  equivalent  conditions: 

(a)  I  —  t  =  I  for  all  i  €  I. 

(b)  I  is  a  subgroup  of  Zs 

(c)  I  is  closed  under  addition  and  multiplication  by  —1. 

We  will  assume  as  well  that  the  quotient  group  Z 4 /I,  the  group  of  cosets  generated  by  I, 
is  finite;  this  will  be  the  case  if  the  elements  of  I,  considered  as  elements  of  R4,  span  R4. 

If  I  is  not  quite  “full",  then  0  can  be  constructed  by  using  (scaled)  translates  of  <?; 
but  if  I  is  quite  “full”  then  a  super  space  containing  <f>  has  to  be  introduced. 

For  convenience,  we  will  set  h  —  1  and  write  s(f)  =  s\(f).  Since  the  procedure  is 
linear,  the  general  result  for  arbitrary  h  >  0  is  attained  by  simply  scaling.  The  first  step  is 
to  construct  an  appropriate  quasi-interpolation  formula  based  on  the  given  data.  Details 
can  be  found  in  CD  [2.3]. 


(1)  Quasi-interpolation.  We  must  construct  an  approximation  of  the  form: 

j€l 

where  each  A y  is  a  linear  functional,  such  that  Ay/(-  + j)  involves  only  values  of  f(k),  k  £  I. 
Thus,  Ay  must  be  expressed  in  terms  of  the  values  of  /  on  the  coset  I  —  j .  We  must  also 
choose  Ay  such  that  Qp  =  p  for  all  p  E  (the  space  of  all  polynomials  in  s  variables 

and  with  total  degree  at  most  n  —  1).  Consequently,  by  scaling  Qf,  we  have  achieved  (i). 

We  then  define  a  function  rp  by  rewriting  Qf  as: 

(Qf)(x)  =  £/(*W>(x  -  k). 

*€/ 

The  basic  technique  to  achieve  our  goal  was  introduced  in  our  earlier  work  CD  [3]: 
Choose  any  A  such  that 

E  A/(- +;>(!-;) 

i€Z* 

preserves  7r£_j.  Our  favorite  is  A  is  the  one  obtained  by  what  we  called  the  Neumann 
series  approach  in  CD  [1].  With  this  A,  we  may  now  compute  A (p).  Then  we  may  solve 
for  Ay  by  using 

Ay(p)  =  A(p),  p€<_!. 

We  show  now  that  in  general,  we  need  to  construct  a  Ay  for  each  element  of  the 
quotient  space  Z ®/J,  by  picking  a  j  from  each  coset.  Consider  the  construction  of  a  quasi¬ 
in  terpolant  in  the  form 


(Q/)(x)  =  -  k), 

kei 

where  \b  is  a  linear  combination  of  translates  of  6,  i.e.  CkO(x-k).  Substituting 

iez* 

into  the  expression  for  (Qf)(x),  we  obtain 


^f{k)p(x-k)  =  Y^f(k)  cy  4>{x-k-j)=  22  cj-hf(k)o(x  -  j) 

kei  kei  je z*  ygz*  k<=i 

=  22  22ci-kf((k  -  7) + ;>(*-»• 

;€Z*  fc€/ 

If  we  identify  the  coefficient  of  <f>(x  —  j)  as  Ay /(•  +  j)  then  A ;  is  given  by 

v  =  Ec'-*-«t-»  =  E 

k£l  ktl-i 
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Now  the  index  sets  I— j,  for  j  £  Z*,  axe  precisely  the  cosets  of  I  and  hence  for  two  different 
j  are  either  identical  or  disjoint.  We  can  therefore  independently  choose  the  c*  for  k  in  each 
coset  I  —  j,  so  as  to  satisfy  the  requirement  that  Aj(p)  =  A (p),p  €  More  precisely, 

we  can  now  furnish  the  following  algorithm  for  constructing  ip(x)  =  Ck<f>(x  —  k )  such 

Jtez* 


that  ( Qf)(x )  =  £  —  i)  is  a  quasi-interpolant: 

*€/ 


1)  For  each  coset  I  —  j  calculate  a  A^  in  the  form  A _,/  =  f(k). 

fe€/-j 

2)  Define  ct  =  ^  &-L  where  the  sum  is  over  a  set  of  coset  representatives. 

j€Z*  // 

3)  Define  t/>(x)  =  £3  ~  £)• 

i€Z* 


(2)  Choice  of  a  basic  cardinal  interpolator.  Our  second  step  is  to  construct  77,  by 
only  using  <f>  if  possible,  or  else  by  using  a  reasonable  super  space  containing  <p  such  that 

(c)  r](j)  =  Sj0,  j  €  I,  and 

(d)  support  (77)  is  small. 

It  is  clear  from  our  assumption  on  the  index  set  I  that  if  p(j)  =  ,  j  £  I ,  then 

t)( k  —  j)  =  8kj  for  j,  k  £  I.  One  simple  way  of  achieving  (c)  is  for  the  support  of  77  not  to 
overlap  with  the  other  sample  points  j  £  I.  (We  are  not  concerned  with  the  approximation 
order  at  this  stage.) 


(3)  Construction  of  0.  From  and  77,  we  may  now  construct  our  compactly  supported 
cardinal  interpolation  function: 


4>(x)  =  £(*.0  -  “  0  +  Mx) 

i€/ 

=  *?(*)  +  0(x)  -  £  -  »)• 

*€/ 

Clearly,  4>(j)  =  <5;o.  Note  that  since  the  index  set  /  satisfies  I  —  i  =  /  for  all  i  £  /,  we  may 
write 


£/(*W(*  “  *) 

it 


=  £ 
k 


/(*)  -  £/0')V’(fr-;') 


7(x  -  *) 


/(*) iK*  -  fc), 


which  is  a  noncommutative  “blending  operator”,  namely: 


J(f-Qf)  +  Qf  =  (J$Q)f 
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where  ©  denotes  the  Boolean  sum  of  J  followed  by  Q.  It  is  clear  that  J  ©(Q  7^  Q©  J  since 
J  f  has  lost  too  much  information  on  /  and  Q  is  not  an  interpolation  operator.  Nonetheless, 
it  is  not  difficult  to  show  that  J  ®  Q  not  only  preserves  polynomials  of  highest  degrees  as 
Q  does,  it  also  provides  the  desirable  interpolation  property  of  J  (cf.  CD  [3]). 

3.  EXAMPLES.  To  illustrate  our  construction  procedure,  it  is  best  to  give  several 
examples.  In  the  following,  we  give  three  examples:  the  first  reconstructs  an  old  example 
due  to  Jenlcin’s,  the  second  completes  an  example  considered  by  DGM  [4],  and  the  third 
example  utilizes  a  C 2  quartic  box  spline  to  give  a  real-time  interpolation  formula  that  gives 
the  fourth  order  of  approximation,  which  is  optimal. 

(1)  C2-quartic  cardinal  spline  interpolation  at  Z.  (Jenkins  [5],  cf.  Schoenberg  [6].) 

Our  procedure  to  construct  Jenkins’  basic  cardinal  interpolation  function  0  is  very 
simple: 

Let  0  be  the  centered  cubic  5-spline  with  knots  at  Z.  Since  I  =  Z,  we  may  choose 

=  j  €  Z, 

where  A /(•)  =  -£/(- 1)  +  f/(0)  -  |/(  1).  This  gives 

Hx)  =  +  1)  +  |<£(x)  -  ^(x  -  1). 

The  function  0  can  now  be  constructed  if  we  can  find  a  C2-quartic  rj  such  that 

f  ^(O)  =  1  and 
\  support^)  =  (-1, 1]. 

T  ,  ,  .  (  N  /  (1  +  3|l|)(l  -  |xj)3,  1*1  <1 

To  do  so,  we  simply  set  n(x)  =  <  ir  u  ,  . 

y  \  0  ,  otherwise. 

(2)  C2-cubic  cardinal  spline  interpolation  at  I  =  2Z.  (Partially  worked  out  using  a 
different  method  by  Dahman,  Goodman,  Micchelli  [4].) 

Again,  let  0  be  the  centered  cubic  5-spline  with  knots  at  Z.  Since  the  sample  points 
are  chosen  to  be  2Z,  we  may  now  choose  r](x)  =  |0(x),  so  that 

irj(j)=Sjo,  je  2Z,  and 
\  support^)  =  {—2, 2). 

Use  any  A  that  induces  a  quasi-interpolant.  Then 

A[1  x  x2  x3]  5=  0  —  ^  0  . 

Since  only  /(2k),  k  €  Z,  are  used,  we  must  construct  at  least  two  different  Ay.  We  consider 
even  and  odd  integers;  so 
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*.(/)« -£/(- 2)  +  |/(0)  -  i/(2), 

A„(/)  =  -l/(-3)  +  l/(- 1)  +  1/(1)  -  1/(3). 

With  Aj  =  Ae  for  even  j,  and  A;  =  A0  for  odd  j,  we  have 

+  3)  -  l«S(x  +  2)  +  A/(x  +  1) 

+  f  /(*)  +  1*>  -  1)  -  -  2)  -  l/(i  -  3). 


Hence,  we  have 


4>(x)  =  t)( x)  +  t/>(z)  -  ^2  ~  0 


=  f  {^(*  +  V  -  +  2)  +  g(K*)  -  Jj«*  -  2) 

+  ^<£(x  -4)}  +  Hx). 

(3)  Cardinal  interpolation  at  2Z2  by  bivariate  C2-quartic  spline  on  the 
3-direction  mesh. 

In  the  example,  we  let  o  be  the  box  spline  Mm-  The  order  of  approximation  is  n  =  4. 
Since  <f>(2k)  =  0  for  all  0  ^  k  €  Z2  and  support  (<£)  is  contained  in  [— 2, 2j2,  we  may  use 


7j(x}  =  2<?i(x) 

Using  the  “Neumann  series”  to  produce  A  cf.  CD  [1],  we  have 

A[1  x  y  x2  xy  y2  x3  x2y  xy2  y3] 

=  [l00  -  l  -l  -  ^  0  0  0  0. 

L  3  6  3 

Since  only  data  at  I  —  2Z2  are  to  be  used,  we  must  have  four  different  A;’s.  We  use 
A1,  A2,  A3,  A4  as  described  in  Figure  1  with  supports  in  I\  =  I  =  {(even,  even)},I2  =/  — 
(1,0)  =  {(odd,  even)},  1$  =  /-(0, 1)  =  {(even,  odd)},  and  Z»  =  I  —  (1, 1)  =  {(odd,  odd)}, 
respectively;  so  that 


^  VO 

j6Zs 

t'=l  j€/t 

where  for  each  A:  =  1 . 4,  and  j  €  /fc,A":/(-  -f  j)  only  involves  evaluation  at  the  even 

integers.  The  functions  t>  and  ri  an  given  in  Figures  2  and  3,  respectively,  where  the 
coefficients  cj  of  o(x  —  j)  =  A/™ 2(x  —  j)  are  shown. 
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Figure  2  (Coefficients  a) 


Figure  3  (Coefficients  a) 
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INTRODUCTION 

The  problem  of  fitting  a  surface  to  small  sets  of  given  data 
has  been  addressed  in  many  different  ways  and  several  computer 
programs  are  currently  available  which  enable  one  to  deal  with 
the  problem  effectively.  Many  of  the  methods  available  involve  a 
global  interpolation  or  approximation  scheme  and  often  involves 
solving  a  system  of  equations  with  an  equivalent  number  of  un¬ 
knowns.  For  very  large  sets  of  data,  the  problem  is  computation¬ 
ally  intractable.  This  consideration  provides  the  motivation 
behind  the  development  of  a  way  to  pare  the  problem  down  to  a 
more  manageable  size. 

We  wish  to  construct  a  function  F  which  approximately  fits 
the  data  since  we  assume  the  data  collection  is  subject  to  measure¬ 
ment  error.  We  propose  to  use  approximation  by  least  squares 
Thin  Plate  Splines  (TPS),  where  the  surface  function  is  constructed 
so  as  to  minimize  an  error  function  subject  to  certain  constraints. 
Solving  the  approximation  problem  will  also  involve  as  many  equa¬ 
tions  as  there  are  data  points ,  but  the  number  of  unknowns  will  be 
significantly  fewer.  Part  of  the  appeal  of  TPS  approximation  lies 
in  the  fact  that  it  minimizes  a  certain  linear  functional,  and  in- 
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vo Ives  a  linear  combination  of  functions  with  no  greater  complexity 
than  the  natural  logarithm  of  the  distance  function. 

Approximation  by  least  squares  TPS  is  straightforward,  once  the 
coordinates  and  (Xj,yj)  are  known.  We  employ  the  TPS  function 

K 

F(x,y)  *  ^£Ajdj2log(dj)  +  ax  +  by  +  c 

where  d^  *  (x^-Xj)^  +  (y^-yj)^  and  the  coefficients  Aj,  a,  b  and  c 
are  chosen  to  minimize  the  error  function 
N 

B  -  £  {[B(*i#Yi)  "  fil/3i>2  • 
i*l 

The  ordinates,  fj,  may  be  subject  to  random  errors,  say  with 
standard  deviation,  s^,  at  the  i1^  data  point.  We  model  the 
plate  under  the  point  loads  at  the  knot  points  (as  opposed  to  the 
data  points ) ;  therefore  the  constraint  equations  for  the  least 
squares  TPS  method,  which  may  be  thought  of  as  'equilibrium 
conditions'  on  the  plate  should  be  satisfied.  Thus,  the  error 
function  is  minimized  subject  to  the  constraint  equations: 

K  K  K 

*  0,  L  AjXa  -  0,  ZAjYj  *  0 

j-1  J  j-1  J  J  j-1  J  J 

We  use  LINPACK  [1]  subroutines  to  do  the  actual  calculations. 

Interpolation  of  scattered  data  by  the  method  of  TPS  was 
developed  from  engineering  considerations  by  Harder  and  Desmarais 
[2].  It  can  be  thought  of  as  a  two  dimensional  generalization  of 
the  cubic  spline,  which  models  a  thin  beam  under  point  loads 
subject  to  equilibrium  constraints.  The  TPS  function  is  derived 
from  a  differential  equation  which  gives  the  deformation  of  an 
infinite,  thin  plate  under  the  influence  of  point  loads.  A  point 
load  is  applied  at  each  data  point  so  that  the  interpolating 
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surface  can  be  constructed  as  a  sum  of  fundamental  solutions  of 
the  TPS  equation. 

In  using  the  least  squares  TPS  approximation  method  to  fit 
the  surface ,  a  fewer  number  of  basis  functions  than  the  number  of 
given  data  points  is  employed.  These  basis  functions  are  cen¬ 
tered  at  a  different,  smaller  set  of  points,  which  in  analogy 
with  the  univariate  case,  we  call  the  knots.  Therefore,  the 
problem  at  hand  is  one  of  selecting  the  knot  points,  and  hence 
the  basis  functions . 

Given  a  'large'  set  of  data  points,  (x^y^fjj,  i  =  1,...,N, 

we  wish  to  find  a  smaller  set  of  knot  points,  (Xj,yj),  j  =  1, ,K, 

which  will  'represent'  the  former  reasonably  well.  This  could  be  accom¬ 
plished  by  choosing  a  subset  of  the  original  set,  or  by  some  process 
which  produces  a  representative  set.  The  ultimate  goal  is  to  approxi¬ 
mate  the  surface  from  which  the  original  data  arose  using  the  represen¬ 
tative  set.  Hence,  a  surface  fit  to  the  large  set  and  one  fit  to  the 
representative  set  should  be  essentially  the  same. 

ORIGINAL  ALGORITHM 

The  original  algorithm  for  solving  the  knot  selection  problem 
was  developed  based  on  the  optimization  of  one  function  subject  to  a 
constraint  formulated  in  terms  of  another  function.  Specifically,  we 
sought  to  achieve  an  equal  or  as  near-to-equal  distribution  of  the  data 
points  amongst  the  knots.  To  do  this,  we  move  the  knots  around  the  do¬ 
main  of  the  fixed  data  points  in  search  of  an  optimal  configuration  of 
knots  which  minimized  the  quantity 

K 

DSUM  *  £  (N/K  -  N.,)2 

3-1 
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where  N  is  the  number  o£  data  points,  K  is  the  number  of  knot  points 
and  Nj  is  the  actual  number  of  data  points  belonging  to  the  jth  knot. 
This  objective  function  is  the  subject  of  later  discussion. 

A  key  advantage  of  this  optimization  is  the  natural  heuristic  which 
precipitates  from  it  for  moving  the  knots  around  the  domain  in  search  of 
a  better  configuration.  This  natural  heuristic  follows  from  the  fact 
that  for  most  configurations  of  knots,  some  knots  will  own  more  data 
points  than  others  so  that  a  simple  mechanism  for  moving  the  knots 
around  is  realized  by  moving  the  knots  owning  the  fewest  points  toward 
the  knots  owning  the  most. 

We  also  sought  to  position  each  knot  in  such  a  way  that  the 
distances  between  the  data  points  and  their  closest  knot  point  was 
minimized.  This  was  accomplished  by  minimizing  the  constraint  function 

GN2  -  E  MIN  [(x^)2  +  (Yi-yj)2] 
i*l  j  J  J 

Thus  our  original  algorithm  would  propose  a  certain  configuration  of 
knots,  determine  which  data  points  belonged  to  which  knot,  move  the 
knots  to  minimize  the  distances,  and  check  the  distribution  of  data 
points  as  a  result  of  this  movement.  Then,  based  on  how  bad  the  distri¬ 
bution  turned  out,  certain  knots  would  be  moved  in  accordance  with 
the  searching  scheme,  and  the  process  would  begin  all  over  again. 

However,  the  particular  scheme  we  developed  to  search  for  the 
optimal  configuration  of  knots  left  alot  to  be  desired  in  terms  of  the 
excessive  computation  time  required.  Hence,  one  objective  of  the 
research  effort  was  to  reduce  the  time  spent  searching  for  an  optimal 
configuration  of  knots  using  a  better  searching  scheme  and  any  other 
means  available.  These  two  topics  are  also  developed  within  this  paper. 

The  constraint  function  GN2  leads  naturally  to  a  default  Dirichlet 
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Tesselation,  a  partitioning  of  the  plane  with  respect  to  the  knot  points 
(See  figure  1).  Thus,  we  say  each  data  point  belongs  to  some  knot  point 
according  to  the  Dirichlet  tile  in  which  it  lies.  Differentiation  of 
GN2  with  respect  to  the  Xj  and  yj  show  that  at  the  minimum,  each  knot 
point  will  occupy  the  centroid  with  respect  to  the  data  points  inside 
that  tile.  The  following  theorem  applies  to  this  constraining  algorithm. 
Theorem:  The  function  GN2  decreases  with  each  iteration  which 
involves  movement  of  a  knot  point.  See  [3]  for  a  proof. 

OBJECTIVE  FUNCTION 

Previous  work  on  this  problem  lacked  sufficient  consideration  of 
the  objective  function  upon  which  the  optimization  in  the  knot  selection 
algorithm  is  based.  Recall  the  function  DSUM  above  defined  as 

K  . 

DSUM  -  £  (N/K  -  Ni)2 
j*l  J 

where  Nj  is  now  the  actual  number  of  data  points  found  in  the  j 
Dirichlet  tile.  It  is  a  measure  of  the  evenness  of  the  distribution  of 
the  data  points  amongst  the  knot  points;  a  smaller  value  is  indicative 
of  a  better  distribution.  The  minimization  is  justified  in  terms  of 
the  desire  to  have  the  Dirichlet  tile  for  each  knot  contain  the  same 
or  nearly  the  same  number  of  data  points .  It  is  a  continuous  function 
in  the  sense  that  there  is  an  infinite  number  of  possible  knot  configur¬ 
ations,  each  corresponding  to  a  value  of  the  function.  An  analysis  of 
the  function  is  motivated  by  several  factors,  summarized  in  the 
following  questions .  What  is  the  minimum  value  the  function  can  assume? 
Under  what  circumstances  can  the  minimum  value  be  obtained  and  how 
feasible  is  obtaining  the  minimum  value? 

First,  we  consider  the  minimum  value  of  the  objective  function. 
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Let  N/  the  number  of  data  points,  be  written  as  N  -  R  m  +  n,  where  K 
is  the  number  of  knot  points  to  be  used  and  m  and  n  are  integers. 

Two  cases  must  be  investigated:  I)  For  n  =  0,  and  IX)  for  n  #  0.  We 
shall  refer  to  these  even  and  nearly  even  distributions  of  data  points 
as  the  Ideal  distribution  for  the  respective  case  of  n. 

Case  I  occurs  when  all  K  knots  own  the  same  number  of  data  points; 
hence  Nj  *  N/K.  Thus,  for  n  ■  0,  we  have  m  »  N/K,  corresponding  to  an 
exactly  even  distribution  of  the  data  points,  so  that  DSTJM  *  0.  Case  II 
occurs  when  K  -  n  knots  own  a  data  points  and  n  knots  own  a  +■  1  data 
points.  It  is  easy  to  verify  that  we  are  working  with  K  -  n  +  n  »  K 

knot  points.  Thus  for  n  *  0,  N  »  K  m  +  n  or  N/K  «  m  +  n/K  so  that 

DSTJM  «  (K  -  n)  (N/K  -  Nj)2  +  n  (N/K  -  Nj)2 

But  the  first  K  -  n  knots  own  Nj  -  m  data  points  and  the  other  n  knots 

own  Nj  =111+1  data  points  so  that  with  the  substitution  above,  we  have 

DSTJM  *  (K  -  n)  (a  +  n/K  -  a)2  +  n  (m  +  n/K  -  m  -  l)2 
Simplifying  this  expression  yields  DSTJM  »  n  -  n2/K. 

Thus  for  the  case  where  N  »  5000,  K  »  250,  the  minimum  value  of  DSTJM 
is  0;  for  the  case  where  N  »  1776,  K  «  100,  the  minimum  value  of  DSTJM 
is  76  -  (5776/100)  -  18.24. 

In  order  to  obtain  some  indication  as  to  the  feasibility  of 
achieving  the  minimum  value  of  DSUM,  we  look  at  the  value  of  DSTJM  after 
an  ever-so-slight  perturbation  as  follows.  Consider  case  I  (n  *  0); 
the  slightest  perturbation  from  the  ideal  distribution  occurs  when  there 
is  one  knot  with  a  +  1  data  points,  K  -  2  knots  with  a  data  points  and 
one  knot  with  m  -  1  data  points.  A  quick  check  of  the  total  number  of 
knot  points  reveals  K*l+K-2+l  and  the  total  number  data  points 
N»m+l+m(K-2)  +  m  -  1  -  m  K  . 

Thus, 
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DSXJM  »  1  [N/K  -  (m  +  l)]2  +  (K  -  2 )  (N/K  -  m)2  +  1  [N/K  -  (m  -  1 )  ] 2 
Substituting  for  m  *  N/K,  since  n  *  0,  we  have 
DSDM  *  (m  -  m  -  l)2  +  0  +  (m  -  m  +  l)2  -  2. 

Thus  the  slightest  perturbation  from  the  ideal  distribution  of  data 
points  yields  a  DSDM  value  slightly  larger  than  the  optimal  value. 

Other  slight  perturbations  for  the  example  of  N  *  100,  K  »  10,  (n  »  0), 
such  as  2  knots  with  9  points,  6  knots  with  10  points,  and  2  knots  with 
11  points,  or  2  knots  with  9  points,  7  knots  with  10  points,  and  1  knot 
with  12  points,  or  1  knot  with  8  points,  8  knots  with  10  points,  and  1 
knot  with  12  points  yield  DSDM  values  of  4,  6,  and  8,  respectively. 

Case  II  where  n  #  0  is  a  bit  more  interesting  since  the  slightest 
perturbation  cam  take  on  several  forma,  each  leading  to  the  same  DSDM 
value.  We  previously  described  the  ideal  distribution  of  this  case  as 
occuring  with  K  -  n  knots  owning  m  data  points  and  n  knots  owning  m  +  i 
data  points.  A  quick  check  of  the  number  of  data  points  reveals  there 
are  N  »  (K-n)  m  +  n  (m+1)  =*  K  m  +  n  data  points.  One  of  the 
slightest  perturbations  occurs  when  there  is  one  knot  with  m  +  2  data 
points,  n  -  1  knots  with  m+1  data  points,  K  -  n  -  1  knots  with  m  data 
points,  and  one  knot  with  m  -  1  data  points.  Thus, 

DSDM  *  1  (N/K  -  m  -  2)2  +  (n  -  1) (N/K  -  m  -  l)2  + 

(K-n  -  1) (N/K  -  m)2  +  1  (N/K  -m+1)2 
Substituting  for  N/K  *  m  +  n/K,  we  have 

DSDM  -  (n/K  -  2 ) 2  +  (n  -  1) (n/K  -  l)2  +(K  -  n  -  l)(n/K)2  +  (n/K  +  l)2 

■  4  +  n  -  n2/K  upon  simpliti- v  ion. 

The  same  result  is  obtained  for  other  slight  perturbations  such  as 
one  knot  with  m  -  1  data  points,  K  -  n  -  2  knots  with  m  data  points, 
and  n  +  1  knots  with  m+1  data  points,  OR  one  knot  with  m  +  2  data 
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points,  n  -  2  knots  with  m  +  1  data  points,  and  K  -  n  +  1  knots  with 
m  data  points.  We  will  take  advantage  of  this  knowledge  about  the 
objective  function,  DSUM,  later  as  part  of  integrating  some  other 
techniques  into  the  algorithm  for  speeding  things  up. 

ALTERNATIVE  KNOT  MOVEMENT  SCHEMES 

Recall  the  natural  heuristic  mentioned  earlier  for  moving  the  knots 
around  the  domain  of  the  fixed  data  points  in  search  of  the  optimal 
knot  configuration.  An  essential  task  in  exploiting  it  lies  in  identi¬ 
fication  of  the  knots  owning  the  most  and  fewest  number  of  data  points. 
The  knot  movement  schemes  developed  to  search  for  an  optimal  knot  confi¬ 
guration  were  based  primarily  on  the  idea  of  spreading  the  wealth  of  the 
knots  owning  the  most  data  points  by  moving  the  knots  owning  the  fewest 
data  points  toward  the  former.  In  developing  these  various  schemes,  we 
considered  both  ease  of  implementation  and  computational  costs  to  be 
paramount . 

As  before  in  the  Original  scheme,  the  rationale  for  moving  the 
knots  around  the  plane  is  to  tweak  the  current  configuration  to  a 
sufficient  degree  so  as  to  cause  the  Dirichlet  tile  boundaries  to  move 
in  such  a  way  that  some  of  the  data  points  will  belong  to  a  different 
knot  point(s).  This  is  followed  by  the  usual  settlement  of  the  knots 
into  the  centroid  locations  of  their  respective  tiles,  such  that  the 
settlement  will  lead  to  a  better  configuration  of  knots  in  terms  of  the 
evenness  of  the  distribution  of  the  data  points  amongst  them. 

The  original  algorithm  employs  a  symmetric  scheme  to  conduct  a  con¬ 
fined  but  exhaustive  (and  correspondingly  expensive)  searrn  for  the 
optimal  configuration  of  knots.  As  seen  in  the  figure  2,  the  so-called 
low  knot  (that  is,  the  one  owning  the  fewest  data  points)  is  moved  to¬ 
ward  the  high  knot  ( the  one  owning  the  most  data  points ) .  The  movement 
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is  done  along  the  line  which  connects  them  and  alternates  between  the 
low- towards -high  and  high-towards-low  modes  in  a  symmetrical  fashion. 

The  distance  moved  along  this  line  connecting  the  two  knots  is  a 
function  of  the  iteration  of  the  movement,  up  to  a  total  of  ten 
moves  each,  or  until  a  better  knot  configuration  is  found.  When  no 
better  configuration  is  found,  the  next  pair  of  high-low  knots  is 
considered,  if  available.  Once  all  possible  combinations  of  high-low 
pairs  have  been  considered,  and  no  better  configuration  has  been  found, 
the  search  is  ended. 

The  Outward  Bound  scheme  (figure  3)  is  charaterized  by  a  move  of  a 
high  knot  away  from  the  low  knot  along  the  line  between  the  two  extended 
beyond  the  high  knot.  Such  a  move  is  justified  by  the  obvious  vacuum 
created  by  such  a  move  in  the  vicinity  of  the  previous  location  of  the 
high  knot  point.  When  such  a  move  fails  to  lead  to  a  better  configura¬ 
tion,  the  distance  moved  along  the  same  line  is  decreased  as  a  function 
of  the  iteration  number  until  the  high  knot  settles  back  to  its  original 
location.  Note  that  because  this  second  and  successive  bounds  are 
made  closer  to  the  concentration  of  data  points,  it  is  more  likely  that 
the  new  knot  location  will  absorb  some  of  the  extra  data  points  in  the 
local  vicinity.  This  is  followed  by  a  move  of  the  low  knot  toward  the 
high  knot  along  the  line  connecting  them  in  an  effort  to  relieve  some  of 
the  pressure  near  the  high  knot  point.  As  before,  when  these  moves  fail 
to  lead  to  a  better  configuration,  the  distance  moved  is  decreased  as  a 
function  of  the  iteration  number  until  the  low  knot  returns  back  to  its 
original  location. 

An  even  better  scheme  evolved  from  the  last  one  wherein  the  low 
knot  is  moved  to  coincide  with  the  high  knot  ( figure  4 ) .  This  approach 
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takes  advantage  of  the  inherent  subroutine  contingency  for  handling  the 
case  wherein  knots  begin  to  coalesce.  In  order  to  preclude  such  coales¬ 
cence  ,  the  one  knot  is  immediately  moved  on  top  of  the  nearest  DATA 
point,  wherever  it  may  be  located.  This  method  has  the  added  advantage 
of  moving  a  knot  in  a  totally  different  direction  from  the  line  connect¬ 
ing  the  high-low  pair  of  knots  being  considered.  As  seen  in  the  figure 
4,  once  splitting  the  monopoly  of  the  high  knot  fails  to  lead  to  a 
better  configuration,  the  low  point  is  moved  along  the  line  connecting 
the  two  knots  some  decreasing  distance  between  them.  The  high  knot  is 
also  moved  out  along  the  same  line  extending  beyond  its  current  location 
in  the  direction  opposite  the  low  point  location. 

Finally,  we  considered  the  situation  wherein  the  movement  desired 
is  ever-so-slight  enough  to  nudge  the  Dirichlet  tesselation  into  one  of 
its  neighboring  configurations,  one  containing  the  optimal  solution 
(figure  5).  Thus,  the  distance  moved  or  trial  distance,  became  a  func¬ 
tion  of  the  area  of  the  domain  of  the  data  and  the  number  of  knots  being 
used.  This  trial  distance  is  increased  as  a  function  of  the  iteration 
up  to  a  set  amount  until  no  better  configuration  was  found.  This  tack 
was  also  used  in  conjunction  with  the  monopoly-splitting  approach 
mentioned  above. 

What  we  settled  on  after  much  testing  with  several  different  test 
data  sets  was  a  combination  of  several  of  these  approaches  as  we  shall 
see.  It  became  apparent  that  more  combinations  of  high-low  pairs  needed 
to  be  considered  in  any  scheme  employed.  Thus,  whenever  fewer  than  five 
high- low  pairs  of  knots  were  found  to  exist,  more  were  generated  by 
identifying  the  knot  owning  the  second  most  data  points  and  so  on  until 
at  least  five  such  combinations  could  be  considered.  We  note  that  we 
could  have  also  considered  the  knot(s)  owning  the  second  fewest  number 
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of  data  points;  however,  such  consideration  is  unwarranted  since  the 
best  results  are  obtained  by  breaking  up  the  monopoly  of  the  knots 
owning  the  most  (or  second  most,  and  so  on,  as  the  case  may  be)  data 
points . 

CELL  METHOD 

All  of  these  approaches  to  the  knot  movement  schemes  involve  the 
identification  of  the  knots  owning  the  most  data  points  and  those 
owning  the  fewest.  This  task  follows  from  having  determined  which  knot 
is  closest  to  which  data  point.  One  obvious  improvement  needed  to  speed 
up  any  of  the  schemes  is  a  way  for  determining  the  closest  knot  point 
for  each  data  point  without  considering  each  pair  of  possibilities  again 
and  again.  In  other  words,  we  needed  to  take  advantage  of  the  fact  that 
not  all  points  needed  to  be  checked  every  time.  The  Cell  Method  was  dev¬ 
eloped  in  a  general  sense  for  locating  the  closest  knot  point  to  a  given 
data  point  by  Renka  [4].  Its  employment  involves  the  use  of  two  subrou¬ 
tines,  STORE 2  and  GETNP2 .  We  will  describe  the  general  idea  of  the  al¬ 
gorithm  in  terms  of  its  application  to  the  problem  of  knot  selection. 

The  motivation  behind  the  use  of  the  cell  method  was  simply  to 
find  a  better,  faster  means  of  identifying  the  closest  knot  point  for 
each  of  the  data  points.  The  original  program  took  a  brute  force 
approach  wherein  the  Euclidean  distance  between  each  of  K  knots  and 
N  data  points  pair  was  computed  and  compared  to  the  others  one  at  a 
time  until  the  closest  knot  was  found  for  each  data  point.  Thus,  a 
minimum  of  N*K  computations  had  to  be  made  each  time  the  subroutine 
was  invoked  to  determine  which  data  points  belonged  to  which  knot  and 
move  the  knots  to  minimize  the  distances. 

A  simple  example  and  sketch  of  the  a  situation  offer  the  best 
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explanation  of  the  method  (figure  6).  First  consider  the  K  knots  which 
have  variable  locations  while  the  N  data  points  are  fixed.  The  smallest 
rectangle  containing  the  knots  is  found  and  partitioned  into  a  3  x  3 
uniform  grid  of  cells  (by  the  ST0RE2  subroutine) .  Since  not  all  the 
cells  will  contain  knots,  the  indices  of  the  knots  contained  in  each 
cell  are  recorded.  Now,  for  a  given  single  data  point,  a  call  to 
GETNP2  is  made  to  find  the  nearest  data  point.  A  search  is  begun  in 
the  cell  containing  the  data  point  or  in  the  cell  to  which  it  is 
closest.  If  a  cell  is  empty  with  respect  to  the  knots,  it  is  not 
considered  for  obvious  reasons.  The  distance  between  the  data  point 
and  the  first  knot  encountered  in  one  of  the  cells  is  calculated  and 
the  search  is  confined  to  those  cells  within  that  distance  of  the  data 
point  under  consideration.  Thus,  only  the  knots  within  those  cells 
can  be  considered  thereby  reducing  the  scope  of  the  search  for  the 
closest  knot  point. 

This  proceedure  must  be  followed  for  each  data  point  in  turn  but 
the  scope  of  the  proceedure  is  much  reduced  compared  to  the  brute  force 
approach;  we  estimate  a  savings  of  around  25%  from  the  original 
computational  effort  required  is  achieved.  The  task  of  locating  the 
closest  knot  point  for  each  data  point  is  performed  by  the  MINORM 
subroutine  which  is  called  twice  by  the  search  subroutine,  TWEEK.  That 
is,  each  time  a  different  configuration  of  knots  is  proposed  within  the 
TWEEK  subroutine,  MINORM  is  invoked  twice,  so  that  it  is  easy  to 
appreciate  the  scope  of  the  savings  enjoyed  by  the  use  of  the  Renka  Cell 
Method. 

Additionally,  the  MINORM  subroutine  was  enhanced  in  the  sense  that 
it  could  be  applied  in  a  more  general  setting  wherein  a  prospective  user 
could  specify  weights  for  each  of  the  data  points.  One  could  think  of 


the  weight  as  being  the  reciprocal  of  the  standard  deviation  of  the 
.rror  associated  with  the  data  measurement  at  a  given  data  point. 

Hence,  instead  of  summing  the  number  of  data  points  in  each  Dirichlet 
tile,  the  weights  associated  with  each  of  the  data  points  within  the 
tile  are  summed.  With  a  relatively  large  weight  at  a  given  data  point, 
one  would  be  able  to  force  the  knot  to  be  at,  or  very  near,  that  data 
point.  The  rest  of  the  knot  selection  algorithm  works  as  before.  How- 
ever,  before  solving  the  least  squares  system,  each  equation  must  be 
scaled  by  the  value  of  the  corresponding  weight. 

SIMULATED  ANNEALING 

Besides  incorporation  of  the  cell  method  and  improvement  of  the 
tweaking  scheme  for  speeding  up  the  knot  selection  process,  it  became 
apparent  that  another  approach  in  the  form  of  Simulated  Annealing  (SA) 
would  prove  useful.  SA  which  is  also  known  by  other  names  such  as  Monte 
Carlo  annealing,  statistical  cooling,  probabilistic  hill  climbing, 
stochastic  relaxation  and  probabilistic  exchange  algorithm,  was  indepen¬ 
dently  developed  and  introduced  by  Kirkpatrick  et  al  in  1982  and  Cemy 
in  1985.  The  name  comes  from  an  analogy  to  the  slow  cooling  of  a  solid 
until  it  reaches  its  low  energy  ground  state  as  developed  by  Metropolis 
et  al  in  1953  [5].  Here,  a  Cost  function  assumes  the  role  of  energy, 
a  control  parameter  is  substituted  for  temper ature ,  and  knot  configura¬ 
tions  are  analogous  to  states  of  the  solid.  The  SA  algorithm  is  a 
general  approximation  algorithm  for  solving  a  wide  variety  of 
combinatorial  optimization  problems  such  as  the  knot  selection  problem. 
It  obtains  near-optimal  solutions  based  on  some  randomization 
techniques  incorporated  into  an  iterative  improvement  algorithm. 

The  application  of  an  iterative  improvement  algorithm  presupposes 
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the  definition  of  conf igurations ,  a  cost  function  and  a  mechanism  for 
generating  transistions  from  one  configuration  to  another,  all  of  which 
are  present  in  the  knot  selection  problem.  The  solutions  obtained  using 
SA  have  the  additional  advantage  of  being  independent  of  the  initial 
configuration  of  knots  and  usually  lead  to  a  solution  near  the  minimum. 
The  similarities  between  the  original  knot  selection  algorithm  and  the 
SA  algorithm  extend  beyond  the  necessary  overhead.  As  with  the  original 
knot  selection  algorithm,  a  configuration  is  given,  followed  by  the 
generation  of  a  sequence  of  configurations  which  are  compared  to  the 
current  configuration  in  terms  of  the  evenness  of  the  distribution  of 
data  points.  When  a  neighbt  ig  configuration  has  a  lower  cost,  the 
current  configuration  is  replaced  by  the  better  one. 

The  randomization  technique  comes  to  bear  in  the  event  a  better 
configuration  is  not  found;  here,  small  increases  are  permitted  to  occur 
in  the  cost  function  with  a  non-zero  but  decreasing  probability.  This 
Metropolis  criterion,  as  it  is  called,  is  implemented  by  drawing  random 
numbers  from  a  uniform  distribution  on  [0,1)  and  comparing  them  in  turn 
to  an  exponentially  decreasing  probabilty  of  acceptance  function  defined 
as  exp(delta  C^j/c)  where  delta  C^j  is  the  difference  in  the  costs 
between  the  two  competing  configurations  and  c  is  the  control  parameter. 
Initially,  the  control  parameter  is  given  a  high  value  so  that  as  the 
algorithm  is  invoked,  the  values  of  c  become  smaller  until  virtually  no 
deteriorations  in  the  cost  occurs  and  the  algorithm  terminates.  Thus, 
the  key  ingredient  of  the  SA  algorithm  lies  in  its  occasional  acceptance 
of  a  worse  configuration  early  on  in  the  search  effort. 

Por  our  particular  application  of  the  SA  algorithm,  we  were  again 
concerned  about  ease  of  implementation  and  additional  computational 
costs.  As  we  shall  see,  neither  of  these  concerns  were  warranted.  As 
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a  point  of  embarkation,  say  that  after  20  iterations  of  the  SA  algorithm 
we  wish  to  have  the  probability  of  accepting  a  worse  configuration  down 
to  around  .01.  We  can  determine  the  value  of  the  control  parameter  c 
using  the  Metropolis  criterion;  that  is,  for  what  value  of  c  will  the 
probability  of  acceptance  approach  .01  given  an  average  difference  in 
the  cost  function  analyzed  earlier  for  case  I  to  be  5,  after  some  20 
iterations.  Thus,  we  solve  for  c  in  exp(-5/c)  *  .01,  which  yields 
c  *  .92;  we  approximate  our  control  parameter  value  after  20  or  so 
iterations  to  be  around  1.0. 

We  continue  with  a  determination  of  the  initial  value  of  the  con¬ 
trol  parameter.  Consider  the  simple  100  point/ 10  knot  case  (n  =>  0) 
wherein  we  are  initially  willing  to  accept  a  worse  configuration  whose 
average  cost  is  not  greater  than  10  (as  compared  to  2  for  the  slightest 
perturbation)  with  a  probability  of  0.5  in  the  first  iteration.  Thus, 
using  the  Metropolis  criterion  again,  we  solve  for  c  in  exp(-10/c)  *  0.5 
which  yields  c  ■  14.  Using  the  same  probability  of  acceptance  for  a  500 
data  point  set  where  the  average  cost  is  no  greater  than  50  in  the  first 
iteration  yields  a  control  parameter  value  of  70.  In  general,  we  could 
express  the  initial  value  of  the  control  parameter  c  as  -N/[10  log(.5)] 
where  N  is  the  number  of  data  points  and  the  probability  of  acceptance 
is  initially  0.5. 

Having  bracketed  the  initial  and  final  values  of  the  control  para¬ 
meter  using  initial  and  final  probabilities  of  acceptance  of  0.5  and  .01 
respectively,  we  are  now  in  a  position  to  develop  an  expression  to 
describe  how  the  control  parameter  decreases  as  a  function  of  the  number 
of  iterations  through  the  SA  algorithm.  Let  the  recursive  formula 
ci+l  *  alpha  cj_  describe  the  behavior  of  the  control  parameter.  For 
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the  500  data  point  case/  we  have  Cjq  *  1  *  1/70  Cq  and  *  alpha  Cq. 

9 

Since  C2  *  alpha  c^,  C3  »  alpha  C2  “  alpha  c^,  and 

c4  -  alpha  c3  -  alpha3  c ,  we  have  c2q  *  alpha19  cx  -  alpha20  cQ. 
Thus,  since  c2g  *  1  *  1/70  cQ  -  alpha20  cQ,  we  have  alpha20  »  1/70  or 
alpha  *  0.80.  Therefore  the  recursive  formula  is  *  .8  c^  for 

the  500  point  case.  We  can  apply  this  recursive  formula  approach  to  our 
application  of  the  SA  algorithm  as  follows.  Since  Cj  *  alpha1  c0,  we 
have  alpha1  »  Cj/Cg  -  1/cg  for  I  *  20  where  I  is  the  number  of  itera¬ 
tions.  Thus,  upon  simplification/  we  have  I  log(alpha)  »  -log(cg)  or, 
following  exponentiation,  alpha  «  exp[-log(cg) /I] . 

We  attempted  one  other  modification  to  the  SA  method  as  part  of  its 
implementation  in  our  knot  selection  application.  Instead  of  decreasing 
the  probability  of  acceptance  as  an  exponential  function  of  the  control 
parameter  and  the  difference  in  costs  between  the  best  current  config¬ 
uration  and  the  proposed  configuration,  we  made  it  a  linear  function 
of  the  same.  We  found  that  we  obtained  more  earlier  acceptances  of 
worse  configurations  in  this  way  which  increased  the  likelihood  that 
a  better  overall  configuration  would  be  found. 

THE  ENHANCED  ALGORITHM  AND  ITS  APPLICATION 

We  are  now  ready  to  outline  the  enhanced  knot  selection  algorithm 
which  is  the  subject  of  this  paper.  As  before,  we  first  identify  the 
knots  owning  the  most  and  fewest  number  of  data  points,  making  use  of 
the  cell  method  to  accomplish  the  task  efficiently.  When  less  than  5 
pairs  of  high-low  combinations  are  found,  the  knot(s)  owning  the  second 
most  number  of  data  points  is  identified  and  added  to  the  search  scheme. 
The  knot  moving  scheme  is  then  invoked  tuned  to  the  user's  requirement 
for  the  degree  of  search  to  be  carried  out.  As  a  minimum,  the  low  knot 
is  moved  to  coincide  with  the  high  knot,  necessitating  an  immediate  move 


of  one  knot  on  top  of  the  nearest  data  point.  This  is  the  monopoly 
splitting  manuever  mentioned  earlier.  A  greater  degree  of  search 
involves  subsequent  moves  of  the  low  knot  toward  the  high  knot  along  the 
line  connecting  them  once  the  monopoly  splitting  fails .  Additional 
moves  of  the  high  point  away  from  the  low  point  along  the  line  connect¬ 
ing  them  follow  in  accordance  with  the  outward  bound  scheme.  As  part  of 
any  of  these  moves  involving  a  worse  configuration  than  the  best  one 
found  to  date,  the  simulated  annealing  method  is  triggered  as  previously 
described. 

Another  question  that  usually  comes  to  mind  has  to  do  with  how  one 
might  put  this  knot  selection  algorithm  to  use  in  conjunction  with  the 
surface  evaluation  using  least  squares  thin  plate  splines.  The  program 
which  we  wrote  for  use  here  at  the  Academy  and  will  publish  for  use  by 
the  scientific  community  in  general  incorporates  several  different  op¬ 
tions  depending  on  what  a  prospective  user  might  wish  to  do.  The  basic 
thrust  of  our  effort  has  been  to  write  a  compact  and  efficient  program 
to  be  used  inside  a  larger  user  generated  program  written  for  some  spe¬ 
cific  purpose.  One  call  is  made  to  a  manager  subroutine  which  identi¬ 
fies  which  option  the  user  requires  and  which  then  sets  up  the  neces¬ 
sary  workspaces  for  efficient  computation.  As  such,  we  envision  the 
brunt  of  the  computation  time  in  search  of  an  optimal  knot  configuration 
being  accomplished  as  part  of  some  preprocessing  done  by  the  user  before 
any  actual  surface  evaluation  inside  the  user's  program.  Once  the  opti¬ 
mized  knot  locations  have  been  identified,  they  can  be  used  again  auxd 
again  within  the  larger  code  unless,  of  course  the  user  generates  more 
data  as  part  of  his  particular  methodology. 

The  first  option  sets  the  knot  selection  problem  up,  optimizes 
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the  knot:  point  locations,  and  solves  for  the  least  squares  coefficients 
using  the  thin  plate  spline  function.  A  user  may  provide  his  own 
initial  guess  for  the  knots  or  they  can  be  generated  in  a  quasi-gridded 
fashion  automatically.  Alternatively,  the  user  may  skip  the  knot  point 
optimization  altogether  and  provide  his  own  optimized  knots.  This 
constitutes  the  extent  of  any  pre-processing  the  user  may  wish  to 
perform.  However,  given  the  parameters  in  the  knot  selection  problem 
including  the  seed  for  the  random  number  generator  used  with  SA  and  the 
extent  o£  search  indicator,  the  user  may  wish  to  conduct  further  tests 
during  the  pre-processing  phase  in  order  to  determine  the  best  values  of 
these  parameters  for  his  particular  application.  A  user  specified  uni¬ 
form  grid  of  points  is  then  used  to  construct  a  surface  from  the  least 
squares  coefficients  found  earlier.  At  this  point,  the  user  may  wish  to 
invoke  the  manager  subroutine  at  regular  or  irregular  time  intervals  in 
order  to  evaluate  the  surface  using  the  least  squares  thin  plate  spline 
approximation  method. 
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Inc reduction 

There  are  several  goodness-of -fit  tests  based  on  the  empirical  distribution 
function,  e.d.f.,  for  example  the  Koimogorov-Smirnov,  the  Cramer-von  Mises,  the 
Anderson  Darling,  and  Watson’s  test.  The  e.d.f.  is  defined  by 

°  ’  x  <X(D 

n  ,  X(l)  <  X  <  x(i) 

1  »  X  <  X 

(n) 

where  x. . .  is  the  itfl  order  statistic  from  a  sample  of  n  observations,  or 
(i) 

simply  stated  as  the  proportion  of  observations  less  than  or  equal  to  x.  If 

the  hypothesis  is  simple,  that  is  FQ(x)  is  fully  specified,  then  we  have  from 

the  Strong  Law  of  Large  Numbers  lim  P { F  (x)  ■  F  (x)}  *  1.  In  a  sense  this 

Q 

is  the  prosyllogism  for  all  statistical  theory.  The  well  known  distribution 
free  univariate  Cramer-von  Mises  test  statistic 

<**2  ■  n  j"  [F  (x)  -  F„(x)]2d  FQ(x)  (2.1) 

where  n  is  the  sample  size  is  based  on  a  measure  of  divergence  from  this 

fundamental  relationship.  With  the  introduction  of  nuisance  parameters,  the 

composite  <*i2  statistic  is  no  longer  distribution  free,  thus  creating 

difficulties.  Additionally,  the  extension  of  this  test  of  fit  to  the 

multivariate  arena  compounds  the  difficulty.  It  is  our  intent  to  extend  the 
* 

composite  Cramer-von  Mises  goodness-of -f it  test  statistic  to  p-dimensions . 

All  results  and  power  studies  against  alternatives  are  obtained  through 
Monte-Carlo  simulation. 


800 


2 


Historical  Remarks 

Cramer  (1928)  approached  the  basic  problem  of  testing  the  hypothesis  that 
a  sample  of  n  independent  observations  comes  from  a  fully  specified  distribu- 
tion  by  measuring  the  discrepancy  between  the  e.d.f.,  Fn(x)  and  the 
hypothesized  d.f.,  FQ(x)  with  the  statistic 

Jn  *  J  [fq(x)  -  FQ(x)]2dx  . 

This  was  generalized  by  von  Mises  (1931)  to 

uj2  -  J  g(x)[Fa(x)  -  FQ(x)]2dx 

where  g(x)  is  a  suitable  weight  function.  Smirnov  (1936)  modified  this  to 
W2  *  n  j"  *[FQ(x)][Fn(x)-  FQ(x)]2d  F  (x) 

— OO 

to  yield  a  distribution  free  statistic.  The  special  case  when  +  *  1,  (2.1), 
is  commonly  called  the  Cramer-von  Mises  best  statistic. 

Little  is  known  about  the  exact  distribution  of  the  Cramer-von  Mises 
test  statistic,  even  in  the  univariate  fully  specified  case.  Several  authors 
including  Anderson  and  Darling  (1952),  Durbin  and  Knott  (1972),  and  Knott 
(1974),  have  studied  the  distribution  of  the  statistic  in  the  simple 
univariate  setting.  These  univariate  results  have  been  extended  to  include  the 
composite  hypothesis  by  Neuhaus  (1973),  Durbin,  Knott  and  Taylor  (1975),  and 
Stephens  (1976).  Percentage  points  are  given  by  Pearson  and  Hartley  (1972). 
Multivariate  extensions  of  the  test  statistic  have  been  studied  by  Rosenblatt 
(1952),  Dugue  (1969),  Durbin  (1970),  Kriuyakov,  Martynov  and  Tyurm  (1977),  and 
Cotterill  and  Csorgo  (1982)  for  the  simple  hypothesis.  The  multivariate 
composite  setting  has  been  investigated  by  Pettitt  (1979)  and  Koziol  (1982). 
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The  most  common  technique  used  in  these  studies 

...is  to  first  find  the  decomposition  of  the  integral 
operator  associated  with  the  covariance  kernel  of  the 
process  in  terms  of  its  eigenvalues  and  eigenvectors. 
Since  the  characteristic  function  of  the  functional 
may  be  expressed  in  terms  of  these  eigenvalues,  the 
requisite  distribution  may  then  be  calculated  by 
numerical  inversion  of  the  characteristic  function 

Koziol  (1982) 


The  Methodology. 

A 

The  univariate  Cramer-von  Mises  statistic  (2.1)  integrates  to 
2 


+  £  £  #*i)  -  K  i  <2i  -  DF0(xi]} 


(2.2) 


where  x^  is  the  order  statistic  of  n  independent  obserations  and  summations 
are  from  i  *  1  to  n.  It  can  be  shown  that 


the  usual  computational  form  of  the  statistic.  If  one  considers  the  multi¬ 
variate  case  we  have, 


“n  D  ’  nH  I  F„  <x;>  '  n  Fn;<x;>l  ndFn.(x.) 

,p  %  j*» 


(2.3) 


nj  y  j-i  °j  j  J  j-t  °j  j 

l  (1  -  i  l  F*  *  S  [1  .  ly  l  in  -  1)F  <x.)]i. 

J  4  j»l  i*l  J  j«l  n  i*l  J 


This  is  a  direct  extension  of  (2.2);  if  the  Fj,  j  ■  l,2,...,p,  are  orthogonal. 

Thus  Let  x^,.-.  .,xn  be  independent  random  p-dimensional  vectors  drawn  from 

a  gaussian  population  with  d.f.  Fg ,  9  *  (p,D),  where  u  is  an  arbitrary 

p-dimensional  vector  and  D  is  an  arbitrary  p*p  positive  definite  matrix. 

-1/2 

Define  *Kx,9)  ■  (x  -  u)D  .  If  9  ■  9Q,  then  the  transformation 
Xj^Y^  *  *(x,9)  yields  a  random  sample  of  p-dimensional  standard  gaussian 
observations,  i.e.,  Y  ~  Np(0,I). 


802 


4 


a  direct 


In  most  situations,  is  not  known,  and  oust  be  estimated  from  the 
sample.  It  is  therefore  of  interest  to  determine  critical  values  for 
assessing  multivariate  gaussianity  with  goodness-of-f it  criteria  when  the 
parameter  9  is  estimated.  Let  9n  denote  the  maximum  likelihood  estimate  of 
0Q  based  on  xi,...,xQ,  that  is,  9q  ■  (Xq,Dq)  where  Xn  ■  n"A  E  X^  and 

Dq  “  n"A  E(x^  -  XQ)(Xj  -  Xq)Z.  dnder  the  null  hypothesis,  Y*  ■  (X  -  X)D 

*  —  A/2  -t/2  z 

is  spherically  normal,  where  D  *  PA  P  ,  A  is  a  diagonal  matrix  having 

A 

as  its  entries  Che  eigenvalues  of  0  and  P  is  a  matrix  of  the  associated 

A 

eigenvectors  of  D.  Thus  F£  (yA , . . .  ,yp)  -  F’  ^y^’-'F'  (yp),  and 

*y[;  <  yt  #y[D  <  yD 

■  vy‘)'"'Vv  ■  — J —  -  — I — E- 

extension  of  the  univariate  e.d.f.,  and  (2.3)  holds. 

Monte  Carlo  simulations  of  (2.3)  for  gaussianity  were  performed.  For  all 

dimensions  and  sample  sizes,  10,000  replications  of  the  simulation  were 

performed.  While  it  is  believed  (2.3)  approaches  its  asymptotic  distribution 

very  rapidly,  the  large  number  of  replications  insured  convergence. 

An  alternative  approach  investigated  is  to  orthogonalize  the  p-dimensional 

sample  of  size  n  as  aboe  and  then  consider  these  transformed  observations  as 

np  univariate ' standard  normal  variates.  This  method  was  also  studied  through 

2 

Monte-Carlo  simulation  and  the  associated  statistic  is  denoted  as  u  .A 

n*p 

final  approach  investigated  is  to  consider  the  maximum  of  ui2  over  the 

n,  i 

margins.  This  statistic  is  noted  as  max  tu2. 

P  n 

All  simulations  were  performed  on  a  Prime  850  minicomputer  using  double 
precision.  The  random  gaussian  variates  were  obtained  using  NAG  subroutines 
(Non-Linear  Algorithms  Group,  Chapter  G-05,  1983). 
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Simulation.  Results. 

Critical  values  from  initial  simulations  for  to2  of  sample  sizes  8,10, 

n,p 

12,15,20,24,30,40,60,  and  120  for  dimensions  1  through  6  are  presented  in 
Table  2.1. 

Our  univariate  results  are  not  in  consonance  with  Pearson  and  Hartley 
(1972)  who  have  for  the  composite  case  the  following  critical  values  where  a 
is  the  probability  in  the  right  tail.  They  indicate  a  relationship  dependent 


ur 


»2d  *  2ii) 


.10 


a  ■  .05 


a  ■  .01 


0.104  0.126  0.178 

upon  sample  size.  Our  critical  values  do  not  follow  the  smooth  curve  as 
suggested  by  the  above  formula  for  n  <  40.  For  larger  sample  sizes,  our 
critical  values  are  slightly  smaller  as  shown  in  Figure  2.1.  This  may  be  due 
to  the  initialization  of  each  simulation  with  the  same  value  thus  causing 
repetitive  random  numbers  for  differing  sample  sizes.  For  example,  the  first 
5,000  samples  of  n  *  30  represent  the  10,000  samples  of  n  ■  15. 

Simulations  of  sample  size  600  were  not  affected  by  various  simultaion 
initialization  values  and  these  percentage  points  were  in  close  agreement  with 
Pearson  and  Hartley.  Thus  n  *  600  was  chosen  as  the  sample  size  for  the 
simulation  producing  the  critical  values  for  dimensions  1  through  12.  Various 
percentage  points  and  moments  are  given  in  Tables  2.2  and  2.3.  From  the  close 


agreement  of  the  statistics'  coefficients  of  skewness,  a3 


•  Uz) 


3  /  2  »  an(l 


Kurtosis,  u.  ■  —r,  where  u •  is  the  ic^  moment  about  the  mean,  we  can  see  the 
UZ 

statistics  are  identically  distributed  with  dimension  acting  as  a  scale 

parameter.  This  is  exemplified  in  Figure  2.2a  through  c  showing  frequency 

graphs  for  p  ■  1,2,  and  3.  Figure  2.3  depicts  the  statics'  frequencies  for 

p  *  1  through  5  plotted  against  a  common  abscissa.  From  resultant  moments  and 

2 

graphs,  the  statistic  u»  appears  to  be  distributed  as  a  non-central 

n,p 

Chi-Square  random  variable,  as  expected  from  the  left  hand  side  of  (2.3). 
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The  scatiseic  u>  was  also  investigated  through  simulations  of  dimension 
n*p 

2  2 

2,10,  and  20.  As  expected,  the  distribution  of  u  matches  that  of  u...  . 

'  '  n*p  600  >  l 

The  statistic  max  <u2  was  investigated  for  sample  size  600  for  p  *  1,...,12. 

P  n 

Results  are  given  in  Tables  2.4  and  2.5.  Thus  we've  three  approaches  based 
upon  the  Cramer-von  Mises  statistic  to  test  for  multivariate  gausaianity. 
Comparison  with  Previous  Investigations. 


The  results  of  our  univariate  simulations  for  sample  size  600  match  those 
of  other  investigators  as  given  in  Table  2.6.  This  validates  our  results 
for  the  univariate  case. 

In  the  case  of  the  multivariate  composite  hypothesis,  little  has  been 

achieved.  Koziol  (1982)  considers  the  empirical  process  (2.3)  as  we  do  but 

—  « - 1  —  t 

uses  the  transformation  Y'  ■  (X  -  X)D  (X  -  X)  .  Thus  the  Y'  are  asymptotic 

chi-squared  random  variables  with  p  degrees  of  freedom. 

Pettitt  (1979)  again  uses  the  empirical  process  (2.7)  but  differs  with 

-  _  »  -i/2  t 

the  transformation  Y'  »  A(D)(X  -  X)  where  A(D)  ■  A  P  ,  A  and  P  defined 
above.  He,  as  Kosiol,  numerically  inverts  the  characteristic  function  to 
obtain  results.  As  Pettitt's  transformation  is  different  from  ours  a 
simulation  of  20,000  replications  of  600  bivariate  samples  was  run  using  his 
methodology.  A  comparison  of  results  is  given  in  Table  2.5.  The  results  are 
in  agreement  and  thus  validates  our  source  code  for  our  simulation  for 
dimension  p  ^  2. 

Powers  of  the  Tests,  oj2  and  ui2  ,  for  Gaussianity. 

n,p  n*p _ 

The  powers  of  the  tests  for  gaussianity  were  computed  by  Monte-Carlo 
simulation  and  are  compared  with  several  tests  for  multivariate  Gaussianity. 
The  power  study  included  the  multivariate  skewness  statistic  of  Mardia  (1974) 


•lp^[  l  {(-  -  -  u)}3, 

IP  n  i-l  j-t  1  J 
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Che  Multivariate  Kurtoeis  statistic  of  Mardia  (1974) 


S'  -i-?  K«i  -  -  u)}: 

Zp  i-i  1  1 


the  Shapiro-Wilk  statistic  of  Malkovieh  and  Afifi  (1973) 


[  I  V(i)l2 

SW  -  —J — - - 


and  the  Anderson-Darling  statistic  of  Paulson,  et.al.  (1986) 


«2 

A 

n,p 


-  I  {logGp(5(l))  ♦  log[l  -  Gp(q(Q+l.0)]}  -  o, 


i«l 


where  Gp(x)  i*  the  distribution  function  of  a  chi-squared  variate  on  p  degrees 

A  A 

of  freedom;  the  q  ^  are  the  in-ascending  order  given  by 


qj  "  "  U^tD"^xj  *  w^; 


m  is  the  index  for  which  q.  achieves  its  maximum,  i.e. , 

3 


qm  "  1  <  j  <  n  qj’ 


the  u.  are  the  in-ascending  order 

u.  -  (x  -  d}C d’A(x.  -  y ) , 
j  a  j 

and  the  a^  are  the  Shapiro  Wilk  constats  tabulated  in  Shapiro  (1980). 

Table  2.8  provides  powers  (in  percent)  for  the  composite  test  of  p-dimensional 

gaussianity  for  p  ■  1,2,  and  5,  n  ■  20  and  50,  and  size  of  test  ■  0.10,  for 

u  and  max  <a .  All  powers  are  based  on  1,000  independent  replications  of  the 
n»P  PQ 

test  for  gaussianity  under  the  alternative  listed.  The  powers  of  the 
competitive  tests  are  taken  from  Paulson  (1973).  For  p  >  1  we  have  provided 
the  powers  of  six  statistics  for  testing  the  hypothesis  of  gaussianity  against 
the  alternatives  that  the  true  distributions  are  p-variate  chi-squared, 
p-variate  t,  p-variate  Dirichlet,  p-variate  log  normal,  and  p-variate  mixtures 
of  gaussians.  The  definitions  of  these  alternative  distributions  follow. 
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LeC  ,xpJt)t ,  t  »  l,2,...,r,  be  distributed  as  Hp(0,D)  where  R 

is  a  poscivie  definite  covariance  matrix.  If  y.  •  £  x*  ,  j  -  l,2,...,p  , 

J  t-l  J* 

then  (yl»y2*«** »yp'  are  distributed  as  a  p-variate  chi-squared  with 

correlation  matrix  0  on  r  degrees  of  freedon.  If  W2  is  independent  of  the 

1/2  .  t 

Xjj^'Sf  and  t *  1  x  /W2 ,  j  m  l,2,..«,p,  then  (t^,t2,...,tp2)  fol lows  a 

p-variate  t  distribution  on  r  degrees  of  freedom.  If  y.(  -  exp(x.f), 

J*  jx 

j  ■  l,2,...,p,  then  (y|£>y2jt»  •  •  •  are  distributed  as  a  p-variate 

lognormal.  The  vector  y  has  the  mixture  of  gaussian  distributions 

bN  (u,D)  +  (1  -  b)N  (u ' ,0' )  if  in  the  random  sample  of  n  y's,  b  of  the  y's 
p  p  n 

have  distribution  N  (u,D)  and  <1  -  b)  of  they's  have  distribution  N  (u',D'), 

pa  p 

where  0  <  b  <  1  and  bQ  is  an  integer.  Following  Wilks  (1962)  let 
xl,x2’ *  *  * ,xp+ 1’  ^  ^dependent  random  variables  having  gamma  distribution 

G('»1),G(v2),...,G(vp+l),  y^  “  xt  ♦  x^  ♦  ♦  xp+l  *  x  where 

v-l  -x 

f(x;v)  ■  "■»  x  >  0,  then  (yt  ,y2 » •  •  •  »yp)  has  the  p-variate  Dirichlet 

distribution  D(vA,v2,...,v  ;v  ^).  When  p  *  1  we  have  the  Beta  distribution. 


Table  2.8  provides  evidence  that  both  w2^  and  m^x  ui2  are  excellent 

omnibus  tests  except  for  sample  size  n  small  and  short  tailed  alternatives. 

The  statistic  u2  was  the  least  powerful  of  the  three  statistics  comsidered 
n*p 


and  results  are  omitted.  The  performance  of  all  tests  improves  with  incresing 

2  .  j 

samples  size.  The  test  m  as  a  rule  dominates  max  u)  because  of  the 

n,p  p  n 

inherent  loss  of  information  concerning  the  p-dimensional  structure  in  the 

formulation  of  max  w2.  This  is  offset  by  the  ability  of  max  m2  to 
p  n  p  n 

indicate  which  margin(s)  are  in  fact  causing  the  non-gaussianity .  Thus  there 

statistics  would  be  used  in  tandom.  However,  little  is  known  about  the  nature 

of  non-gaussianity  upon  rejection  of  the  null  hypothesis  so  we  recommend  the 

use  of  Mardia's  b^  and  b  in  conjunction  with  the  proposed  statistics. 

P  2 

P 
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While  Che  Shapiro-Wilk  statistic  is  a  somewhat  better  omnibus  statistic 

Chan  b,  or  b.  ,  b.  is  naturally  better  for  skewed  alternatives  such  as  the 

ip  2p’  Ip 

chi-squared  family  and  b^  is  inherently  better  for  the  more  peaked  and  longer 

tailed  distributions  such  as  the  t-family.  These  results  are  consistent  with 

the  univariate  case.  Our  results  also  indicate  that  the  Anderson-Darling 

tests  to  have  a  marked  loss  of  power  as  p  increases  while  our  proposed 

2 

statistics,  especially  m  ,  do  not. 

n,p 

Some  Examples  and  Discussions. 

EXAMPLE  1.  The  length  and  breadth  of  9,440  beans  as  measured  by  W.  Johnnsen 

and  studied  by  Wicksell  taken  from  Pretorius  (1930)  is  given  in  Table  2.9 
and  contour  graph  in  Figure  2.4.  The  data  is  considered  in  4  manners. 

Let  X  ■  length  and  Y  ■  breadth.  Taken  independently,  the  margin  of  X 
yields  a  “9,^0  l  statistic  of  34.9934  and  the  margin  of  Y  yields  a 
“9440  stati*tic  of  42.8855. 

Taken  as  9440  bivariate  observations,  “g440  2  *  897.5406  and 
ll,944Qx2  *  2103.6814.  All  observed  statistics  are  much  greater  than 
max(d»26Q0  *  0.3302  and  max(w*g0(j  ^)  -  0.1485.  Clearly  the  margins  and 
bivariate  obseracions  are  non-gaussian. 

EXAMPLE  2.  Mardia  (1970)  gives  the  number  of  Mullerian  glands  as  the  right 
and  left  farelegs  of  2,000  male  pigs  where  x  *  the  number  of  glands  on 
the  right  legs  and  y  ■  the  number  of  glands  on  the  left  legs.  The  data 
is  presented  in  Table  2.10  and  contour  graph  in  Figure  2.5.  Proceeding 
as  above,  the  margin  of  X  yields  a  ^OQO  ,  statistic  of  5.727  and  the 
margin  of  Y  yields  5.5918.  Taken  as  a  bivariate  sample,  “^aoo  2  "  2.2480 
and  u2000x2  "  47.9995.  Again,  the  data  is  clearly  non-gaussian. 

In  the  above  two  examples  the  obvious  non-gaussianity  of  the 

observations  does  not  allow  other  statistic,  u>2  ,  to  be  better  than  the 

n,p 

other. 
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EXAMPLE  3.  Granadesiken  (1977,  pp50-52)  gives  and  discusses  a  set  of  61 
bivariate  observations  which  are  constructed  by  systematically  taking 
points  from  the  surface  of  a  paraboloid  and  adding  spherical  gaussian 
noise  to  each.  The  date  is  given  in  Table  2.11  and  a  graph  of  X,  Y,  and  Z 
versus  observation  number,  1>61,  is  presented  in  Figure  2.6.  From 
Figure  2.6  we  see  no  non-linearity  and  may  be  led  to  believe  the  data  is 
in  fact  3  dimensional  gaussian.  The  margins  of  X,Y,  and  Z  yield  co2 

O  l  j  1 

statistics  of  0.0239,0.0833,  and  0.1092  indicating  X  and  Y  are  gaussian 

while  Z  is  not  with  p-value  0.084S.  Taken  as  tri variate  observations, 

3  ■  2.2875  with  associated  p  value  of  0,  max(w200  3  *  .0624),  and 

■  0.0654  with  associated  p  value  of  0.3287.  We  see  here  an  example 

where  the  statistic  outperforms  the  statistic  w2  .  As  in  Che 

n,p  n*p 

power-study  above,  we  have  lost  the  3-dimensional  structure  of  the  data 

and  allowed  the  gaussian  margins,  X  and  Y  to  influence  the  statistic 

u>26tx3.  The  performance  of  these  statistics  is  compared  with  the 

competitors  in  Table  2.14.  We  see  <s2  outperforms  all  others. 

n ,  p 

EXAMPLE  4.  The  Iris  data,  Table  2.12,  of  Fisher  (1936)  has  been  extensively 
studied  and  is  used  to  evaluate  clustering  algorithms  (Nicholson,  1982). 

In  particular,  the  iris  versicolor  and  iris  virginice  groups  are  very 
difficult  to  separate.  The  reason  for  this  difficulty  is  indicated  by  the 
results  listed  in  Table  2.13.  The  margins  of  sepal  and  petal  length,  SL 
and  PL,  for  iris  versicolor  and  iris  virginice  are  shown  to  be  gaussian 
with  the  same  p-values  for  each  margin.  For  these  two  varieties,  the 
margins  sepal  and  petal  width,  SW  and  PW,  are  shown  Co  be  non-gaussian. 

The  extent  of  the  departures  from  gaussianity  is  nearly  the  same  for  the 
two  vericty's  margins.  The  ability  to  separate  iris  setosa  from  the  other 
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two  varieties  is  due  to  the  non-gauss ianity  of  iris  setosa's  margins  of 
PL  and  SW  and  the  acceptance  of  gauasanity  of  the  margins  SL  and  SW. 
Gmamadeslom  (1977,  (pp218-222)  finds  observation  numbers  16  and  42  to  be 
unusual  in  the  iris  sefosa  data  set.  This  explains  the  large  observed 
statistics  for  iris  setosa's  margins  PL  and  specifically  PW. 

While  we  cannot  conclude  directly  that  there  are  two  populations 
represented  by  the  combined  sets  we  are  able  to  conclude  that  a  4  variate 
gaussian  model  is  not  adequate  for  the  data  sets  taken  independently  not 
as  a  whole. 
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SAMPLE  SIZE 


p 

a 

8 

10 

12 

15 

20 

24 

30 

40 

60 

120 

1 

.10 

0.972 

0.994 

0.985 

1.007 

1.020 

1.015 

1.022 

1.026 

1.015 

1.019 

xlO 

.05 

1.181 

1.191 

1.194 

1.226 

1.236 

1.224 

1.216 

1.240 

1.236 

1.223 

.01 

1.672 

1.649 

1.707 

1.713 

1.705 

1.753 

1.767 

1.727 

1.747 

1.798 

2 

xlOO 

.10 

.05 

.01 

5.932 

6.829 

8.814 

5.933 

6.862 

9.176 

5.957 

6.937 

9.150 

5.971 

6.874 

9.131 

6.084 

6.974 

9.136 

6.009 

6.956 

9.124 

6.147 

7.103 

9.322 

6.076 

7.103 

9.302 

6.099 

6.990 

9.339 

6.141 

7.047 

9.124 

3 

.10 

2.819 

2.847 

2.838 

2.871 

2.861 

2.857 

2.900 

2.898 

2.876 

2.863 

xlOO 

.05 

3.186 

3.243 

3.208 

3.255 

3.277 

3.259 

3.244 

3.276 

3-272 

3.218 

.01 

4.153 

4.170 

4.192 

4.164 

4.132 

4.095 

4.096 

4.089 

4.072 

4.103 

4 

.10 

1.235 

1.240 

1.242 

1.244 

1.246 

1.251 

1.234 

1.230 

1.226 

1.229 

xlOO 

.05 

1.403 

1.398 

1.397 

1.399 

1.406 

1.414 

1.378 

1.367 

1.362 

1.369 

.01 

1.789 

1.787 

1.783 

1.778 

1.737 

1.734 

1.701 

1.705 

1.701 

1.680 

5 

.10 

5.143 

5.186 

5.092 

5.049 

5.093 

5.012 

5.023 

5.010 

4.996 

5.005 

*1000 

.05 

5.839 

5.792 

5.784 

5.719 

5.682 

5.589 

5-578 

5.553 

5.559 

5.540 

.01 

7.641 

7.390 

7.633 

7.219 

7.000 

6.930 

6.795 

6.784 

6.764 

6.822 

6  .10 

2.076 

2.045 

2.047 

2.020 

1.997 

1.990 

1.995 

1.980 

1.978 

1.951 

xlOOO  .05 

2.348 

2.704 

2.317 

2.256 

2.227 

2.228 

2.208 

2.208 

2.195 

2.159 

.01 

3.078 

2.980 

3.005 

2.890 

2.791 

2.771 

2.743 

2.699 

2.659 

2.601 

TABLE  2.L  Critical  Values  of  u2  ;  p  *  1,...,6;  u  »  8, ...,120 
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ALPHA 


DIMENSION 

O 

CM 

• 

.15 

.10 

.05 

.025 

.01 

.001 

1 

(x 10* ) 

0.8197 

0.9111 

1.0406 

1.2555 

1.4895 

1.7984 

2.0319 

2 

(xlO1) 

0.5138 

0.5563 

0.6099 

0.6997 

0.8054 

0.9215 

1.0105 

3 

(*102) 

2.4726 

2.6642 

2.3982 

3.2456 

3.6272 

4.0366 

4.4276 

4 

(x102) 

1.0815 

1.1484 

1.2366 

1.3806 

1.5243 

1.7115 

1.8329 

5 

(XlO3) 

4.4361 

4.6809 

5.0047 

5.5123 

6.0138 

6.5131 

6.8977 

6 

(XlO3) 

1.7547 

1.8388 

1.9599 

2.1637 

2.3519 

2.5731 

2.7373 

7 

(XlO3) 

0.6771 

0.7092 

0.7526 

0.8205 

0.8949 

0.9837 

1.0284 

8 

(XlO4) 

2.5490 

2.6700 

2.8200 

3.0672 

3.3050 

3.6115 

3.8463 

9 

(XlO4) 

0.9514 

0.9911 

1.0399 

1.1259 

1.2078 

1.3060 

1.3848 

10 

(XlO5) 

3.4812 

3.6211 

3-8172 

4.1167 

4.4562 

4.3273 

5.1069 

11 

(xlO5) 

1.2718 

1.3226 

1.3867 

1.4903 

1.5909 

1.7311 

1.8343 

12 

(xlO6) 

4.5909 

4.7583 

4.9943 

5.4018 

5.7826 

6.2861 

6.5937 

TABLE  2.2  Critical  Values  of  w2  ;  p  ■  1,...,12;  n  ■  600 
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MOMENTS 


DIMENSION 

MEAN 

VARIANCE 

SKEWNESS 

KURTOSIS 

1 

5«9772(*102) 

1.1764(X103) 

1.7410 

7.7783 

2 

3.9730(x102) 

2.6254(x104) 

1.2546 

5.3986 

3 

1.9819(x102) 

4.5358(x10s) 

1.0597 

4.6504 

4 

8.8850(x 103 ) 

6.9824(x106) 

0.9942 

4.5859 

5 

3.6752(x103) 

9.8374(x107) 

0.9095 

4.5397 

6 

1.4759(x103) 

1.3755(x107) ■ 

0.9979 

5.0131 

7 

5.7300(xl04) 

1.8452(*1Q8 ) 

0.8758 

4.3227 

8 

2. 1786(X 104 ) 

2.3771(x  109) 

0.8592 

4.4465 

9 

8. 1670(x 10s ) 

2.9314(x10i0) 

0.7513 

4.1196 

10 

3.0213(*103) 

3.7224(xlOil) 

0.8196 

4.2651 

11 

l .  I089(x 10s ) 

4.5496(x10i2) 

0.8138 

4.5588 

12 

4.0240(x105) 

5.7521(x1013) 

0.9013 

4.8506 

TABLE  2.3  Moment 3 

of  a)2  ;  p  - 

n,p 

1, ...  ,12;  n  * 

600 
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ALPHA' 


DIMENSION 

.20 

.15 

.10 

.05 

.025 

.01 

.005 

1 

.08196 

.09110 

.10405 

.12555 

.14894 

.17983 

.20319 

2 

.10180 

.11183 

.12555 

.14799 

.17321 

.20212 

.22654 

3 

.11499 

.12505 

.13840 

.16161 

.18494 

.21204 

.23462 

4 

.12299 

.13350 

.14674 

.16986 

.19156 

.22948 

.25401 

5 

.13123 

.14181 

.15638 

.17833 

.20025 

.23273 

.25500 

6 

.13641 

.14668 

.16175 

.18599 

.20688 

.24088 

.26610 

7 

.14053 

.15081 

.16511 

.18740 

.21220 

.24385 

.26315 

8 

.14609 

.15598 

.17195 

.19474 

.21934 

.24312 

.27404 

9 

.15011 

.16027 

.17397 

.19823 

.22126 

.25107 

.27147 

10 

.15280 

.16295 

.17754 

.20038 

.22291 

.25119 

.27394 

Il‘ 

.15587 

.16643 

.18014 

.20390 

.22637 

.25679 

.27577 

L2 

.15719 

.16723 

.18201 

.20572 

.22999 

.25364 

.27739 

TABLE 

2.4  Critical  Values  of 

max  (*>2 

P  <* 

;  p  ■  i»- 

..,12; 

n  *  600 
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MOMENTS 


DIMENSION 

MEAN 

VARIANCE 

SKEWNESS 

KURTOSIS 

1 

0.05977 

0.00117 

1.74100 

7.77820 

2 

0.07731 

0.00138 

1.58188 

7.07833 

3 

0.08820 

0.00146 

1.39058 

5.98084 

4 

0.09684 

0.00150 

1.43042 

6.52263 

5 

0.10344 

0.00158 

1.30936 

5.79542 

6 

0.10881 

0.00162 

1.39810 

6.57393 

7 

0.11269 

0.00159 

1.31411 

5.37041 

8 

0.11752 

0.00168 

1.32513 

5-99709 

9 

0.12123 

0.00169 

1.32798 

6.23711 

10 

0.12354 

0.00164 

1.21282 

5.31291  • 

11 

0.12732 

0.00164 

1.28152 

6.09602 

12 

0.12880 

0.00163 

1.18409 

5.05303 

TABLE 

2.5  Moments 

of  max  (u2  ; 
P  a 

p  -  1,. .  • ,12; 

n  *  600 
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u2 

600  , i 

D.K&T 

P&H 

S 

(1  -  a)  -  .01 

.01669 

.01651 

.05 

.02231 

.02228 

.10 

.02647 

.02638 

.20 

.03277 

.03269 

.50 

.05125 

.05087 

.80 

.08197 

.08114 

.85 

.09111 

.091 

.90 

.10406 

.10354 

.104 

.95 

.12555 

.12602 

.126 

.975 

.14894 

.148 

.99 

.17990 

.17878 

.178 

li 

.05977 

.0595 

a* 

.001176 

.00117 

io3u3 

.07025 

.0709 

io4\ 

.10764 

.1116 

a3 

1.7140 

1.780 

7.7783 

3.186 

D,K&T:  Durbin, 

Knott  and  Taylor  (1975) 

S:  Stephens 

(1976) 

P&H:  Pearson  and  Hartley  (1972) 


TABLE  2.6 


Univariate  u)  Comparisons 
n,  p 
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a 

Pettict 

Monte  Carlo 

.20 

.075 

.068 

.15 

.080 

.076 

.10 

.088 

.087 

.05 

.100 

.106 

.025 

.112 

.127 

.01 

.128 

.157 

.005 

.140 

.184 

TABLE  2.7  Comparison  of  w2  ,  Using  PeCtitt's  Trans  forma cion 

n .  z 
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Powers  of  «»2  ,  max  w2,  Mardia's  b,  and  b,  ,  Shapiro  Wilk  (SW), 

QfP  P  o  L P  2P 

Anderson  Darling  (AD),  and  Kolmogorov-Smirnov  (KS)  teats  for 
gaussianity  at  significance  level  a  *  .10;  p  »  1,2,5;  and  n  ■  20,50 


(a)  p  ■  1,  n  *  20 


STATISTIC 


ALTERNATIVE 

oj2 

n,l 

% 

% 

SW 

AD 

KS 

x2d) 

96.4 

90.0 

6.0 

99.2 

99.2 

92.5 

X2(2) 

82.5 

69.1 

44.4 

91.4 

90.3 

73.5 

X2(4) 

54.7 

47.6 

31.7 

65.8 

64.0 

41.5 

X2<6) 

40.7 

36.8 

25.2 

50.9 

52.0 

34.0 

x2(io) 

27.5 

25.2 

20.6 

36.6 

38.0 

25  •  5 

X2 ( 14) 

21.4 

20.1 

16.8 

28.9 

29.0 

19.0 

t(l) 

89.9 

78.7 

89.3 

87.7 

94.4 

90.6 

t(3) 

36.9 

36.7 

41.8 

41.8 

44.0 

33.5 

t  (5) 

21.6 

21.4 

26.2 

24.3 

29.5 

21.0 

t  (7) 

16.8 

17.9 

18.1 

20.0 

22.5 

15-0 

t  (  9) 

13.9 

16.9 

16.8 

17.4 

19.7 

14.5 

Lognormal 

93.2 

87.1 

66.5 

96.5 

96.0 

85.7 

Logistic 

13.9 

17.4 

19.5 

17.7 

19.5 

15.0 

Beta(l,l) 

28.6 

6.5 

46.5 

34.3 

37.0 

13.5 

(1,3) 

52.6 

31.5 

22.4 

63.7 

63.0 

38.0 

(1,5) 

61.0 

48.6 

29.1 

76.1 

75.5 

53-5 

(2,1) 

38.8 

12.0 

20.9 

48.8 

45-3 

3  0.0 

(5,1) 

63.0 

50.1 

27.6 

80.1 

73-8 

52.5 

.8N(0,1)+.2N(0,4) 

9.6 

25.9 

26.0 

22.7 

30.0 

16.5 

•9N(0,1)+.1N(0,4) 

8.7 

19.7 

19.9 

19.1 

25-5 

13-0 

.  7N( 0, 1 )  +  .3N( 1 ,4) 

7.9 

28.3 

23.8 

28.8 

35.0 

24.0 

TABLE  2.8  Power  Study  Table 
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(b)  p  ■  1,  n  -  50 


STATISTIC 


ALTERNATIVE 

UiZ 

n,  1 

% 

b, 

2P 

SU 

AO 

RS 

x2d) 

100 

99.9 

92.9 

100 

100 

100 

X2(2) 

99.4 

98.3 

72.6 

100 

100 

98.3 

X2(4) 

89.7 

89.5 

52.8 

97.3 

93.9 

80.2 

X2(6) 

74.9 

77.6 

43.3 

88.9 

85.0 

65.5 

x2(io) 

54.3 

57.5 

28.8 

72.4 

66.0 

48.0 

X2 (14) 

41.3 

45.1 

26.5 

58.2 

50.0 

34.7 

t(l) 

100 

91.2 

99.3 

99.7 

99.8 

99.6 

t(3) 

65.7 

57.3 

73.2 

65.3 

70.0 

59. 7 

t(5) 

34.6 

32.9 

46 . 6 

36.1 

42.5 

31.0 

t(7) 

21.9 

29.2 

34.9 

23.6 

29.5 

20.0 

C(9) 

18.5 

21.6 

25.1 

18.2 

23.5 

15.0 

Lognormal 

100 

99.9 

96.1 

100 

100 

99.3 

Logistic 

21.8 

18.8 

25.8 

20.4 

22.6 

15.1 

Beta(l ,1) 

61.4 

9.1 

92.2 

94.4 

77.0 

42.0 

(1,3) 

90.0 

71.6 

19.4 

99.8 

95.2 

80.0 

(1,5) 

96.4 

92.0 

37.7 

99.9 

98.7 

91.1 

(2,1) 

73.6 

31.2 

33.1 

96.4 

85.5 

58.5 

(5,1) 

96.4 

91.3 

36.7 

99.8 

99.0 

89.7 

.8N(0,1)+.2N(0,4) 

91.2 

28.8 

42.9 

24.5 

34.3 

23.0 

. 9N( 0 , 1 )+. 1N( 0,4) 

73.0 

29.3 

34.7 

22.4 

26.8 

16.0 

. 7N( 0 ,1 )+.3N( 1,4) 

94.6 

46.1 

42.4 

48.7 

54.0 

39.5 

TABLE  2.8  Power  Scudy  Table  (Cont) 
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(c)  p  ■  2,5  and  n  *  20 


STATISTICS 


ALTERNATIVE 

a2 

n,p 

max  w2 
p  n 

b*P 

bzP 

SW 

AD 

p  -  2  X2(l) 

96.9 

95.2 

95.0 

77.1 

96.8 

72.9 

X2(2) 

85.3 

77.6 

77.2 

53.9 

83.6 

46.7 

X2(4) 

61.9 

52.5 

52.4 

35.8 

60.4 

21.8 

-X2(6) 

46.0 

40.4 

35.6 

26.8 

46.0 

15.3 

x2do) 

34.7 

28.1 

25.3 

21.6 

34.3 

14.6 

X2(14) 

28.5 

25.5 

20.9 

18.6 

26.0 

11.1 

t(l> 

100 

100 

94.8 

97.5 

95.6 

96.8 

t  (3) 

93.3 

92.1 

52.2 

58.1 

60.3 

51.3 

t(5) 

73.9 

70.3 

33.7 

35.8 

37.0 

28.1 

t(7) 

56.3 

53.6 

22.7 

24.8 

32.0 

18.3 

t(9) 

41.1 

36.8 

17.6 

13.2 

24.8 

16.5 

Lognormal 

95.7 

100 

94.6 

79.1 

95.3 

74.5 

Dirichlet(l,l,l) 

26.2 

22.3 

77.4 

55.3 

82.8 

45-3 

(1,2,3) 

32.4 

31.5 

65.9 

42.3 

75.1 

34.3 

(2,1,2) 

35.8 

33-0 

63.9 

44.0 

74.5 

33.4 

(5,1,5) 

61.4 

49.3 

57.3 

38.5 

69.1 

24.3 

(5,1,1) 

32.6 

35.0 

54.9 

36.6 

64.3 

25.4 

P  -  5  X2  ( 1) 

82.6 

71.9 

98.2 

89.3 

92.4 

67.6 

X2(2) 

56.0 

42.3 

81.3 

63.7 

74.2 

35.3 

X2  (4) 

34.1 

23.8 

49.2 

39.0 

48.8 

12.3 

X2  (6) 

24.4 

16.1 

34.0 

25-4 

38.6 

9.4 

X2(10) 

16.0 

13.3 

20.7 

17.4 

26.0 

8.0 

X2(14) 

15.7 

11.5 

17.6 

16.7 

24.7 

8.5 

t(l) 

100 

100 

99.9 

99.9 

99.2 

99.3 

c(3) 

100 

86.5 

79.4 

84.3 

76.6 

56.6 

c(5) 

64.6 

55.1 

53.9 

58.7 

54.1 

29.9 

C  (  7) 

42.8 

36.1 

38.7 

43.2 

40.9 

15 . 6 

t  (9) 

34.4 

80.9 

31.2 

34.5 

34.7 

12.6 

Lognormal 

87.6 

80.4 

98.8 

94.3 

95.3 

77.1 

Dirichlet(l, 1,1, 1,1,1) 

18.9 

15.6 

82.6 

64.9 

72.8 

35.2 

(1,2, 1,2, 1,2) 

19.3 

15.9 

71.6 

54.9 

62.1 

23.8 

(5, 1,1, 5, 1,5) 

22.1 

19.0 

62.4 

48.8 

57.4 

18.6 

(2, 1,1, 2, 1,1) 

19.0 

14.8 

72.3 

54.9 

67.9 

28.7 

TABLE  2.8  Power  Study  Table  (Cont) 
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(d)  p  ■  2,5  and  a  »  50 


STATISTICS 


ALTERNATIVE 

(U2 

n,p 

max  w2 

P  n 

b^P 

b2p 

SW 

AD 

P  -  2  X2(l) 

100 

99.8 

100 

97.7 

100 

99.8 

X2(2) 

98.9 

98.7 

99.8 

88.5 

99.5 

88.3 

X2(4) 

92.5 

88.0 

97.0 

64.8 

93.8 

56.2 

X2(6) 

82.7 

76.7 

86.8 

51.9 

88.2 

37.3 

X2(10) 

64.7 

55.6 

64.5 

34.3 

70.9 

20.1 

X2(14) 

57.4 

46.8 

54.1 

30.9 

55.9 

17.7 

t(l) 

100 

100 

99.5 

100 

100 

100 

t  (3) 

100 

100 

77.2 

92.5 

82.5 

91.7 

t(5) 

97.5 

95.7 

50.3 

68.6 

57.1 

64.1 

t(7) 

88.5 

84.6 

34.4 

50.2 

42.4 

42.6 

t  ( 9) 

75.7 

71.4 

27.4 

40.3 

34.9 

17.7 

Lognormal 

100 

100 

100 

99.1 

100 

99.5 

Dirichlet(l,l,l) 

59.1 

52.0 

100 

89.0 

99.6 

87.9 

<1,2,3) 

52.0 

50.0 

94.4 

79.8 

98.0 

72.0 

(2,1,2) 

74.8 

73.0 

99.2 

78.0 

98.6 

73.9 

(5,1,5) 

96.8 

93.6 

98.1 

70.6 

93.5 

53.8 

(5,1,1) 

71.5 

74.4 

97.1 

69.1 

94.4 

57.0 

P  -  5  X2  ( 1) 

99.3 

98.9 

100 

100 

99.9 

99.8 

X2  ( 2) 

94.2 

87.7 

100 

98.0 

98.7 

94.4 

X2  (4) 

74.5 

62.1 

98.5 

77.7 

88 . 6 

66.0 

X2(6) 

56 . 2 

43.9 

92.0 

63.9 

76.2 

42.1 

X2U0) 

33.3 

26.7 

71.1 

42.8 

55.8 

24.5 

X2( 14) 

28.3 

21.8 

54.4 

31.4 

45.2 

15.3 

t(l) 

100 

100 

100 

100 

100 

100 

t  (3) 

100 

100 

98.4 

99.7 

97.5 

99.5 

C  ( 5) 

96.2 

92.6 

86.1 

95.4 

85.4 

91.3 

t(7) 

85.4 

74.7 

70.1 

85.5 

71.6 

73.0 

t(9) 

66.7 

54.2 

61.4 

73.6 

58.3 

54.7 

Lognormal 

99.9 

99.5 

100 

100 

100 

100 

DirichlecCl, 1,1, 1,1,1) 

38.4 

34.4 

100 

97.0 

98.7 

95.6 

(1,2, 1,2, 1,2) 

37.4 

33.7 

100 

92.4 

96.9 

87.2 

(5, 1,1, 5, 1,5) 

49.5 

45.2 

99.7 

88.7 

94.7 

76.5 

(2, 1,1, 2, 1,1) 

42.0 

38.2 

100 

93.0 

96.3 

87.9 

TABLE 

2.8 

Power  Study 

Table 

(Cone ) 
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Powers  of  <u?n  max  u>?„  and  Mardia's 
50  »P’  p  50  ’ 

for  guassianity  oa  various  mixtures. 


b,  and  b_  tests 

Ip  2p 


(e)  p  *  2,5  and  n  ■  50 


STATISTICS 


ALTERNATIVE 

(i)2 

a,p 

max  w2 

P  n 

b2p 

P 

- 

2 

>8N(0,I) 

♦ 

•2N(0,4I) 

33.1 

30.1 

46.9 

64.1 

9N(0,I) 

♦ 

•1N(0,4I) 

24.7 

21.5 

47.9 

38.9 

7N(0,I) 

+ 

•3N( 1,41) 

64.9 

54.0 

67.1 

69.4 

7N(0,I) 

+ 

•3N(2,4I) 

85.3 

79.5 

88.1 

71.6 

7N(0,I) 

+ 

•3N(3,4I) 

92.1 

88.5 

95.2 

66.4 

5N( 0 , A) 

•5N(0,B) 

14. 4 

13.4 

17.3 

21.3 

where 


p  -  5 


.8N( 0,1) 

♦ 

.2N(0,4I) 

45.4 

32.7 

87.8 

95.3 

•9N(0,I) 

. 1N(0 ,41) 

27.8 

21.1 

73.9 

81.1 

• 7N( 0 , I) 

+ 

. 3N( 1 ,41) 

75.3 

58.6 

97.1 

98.6 

. 7N( 0,1) 

+ 

• 3N( 2,41) 

88.5 

77.0 

98.6 

98.4 

•7N(0,I) 

♦ 

.3N(3,4I) 

91.2 

83.6 

99.3 

98.8 

. 5N(0,C) 

+ 

•5N(0,D) 

11.1 

8.6 

26.8 

33.5 

where 


— 

— 

1 

.5 

.5 

.5 

.5 

1 

-.5 

.5 

-.5 

.5 

•  5 

1 

•  5 

.5 

.5 

-.5 

1 

-.5 

.5 

- .  5 

.5 

.5 

1 

.5 

.5 

and 

D  « 

.5 

-.5 

1 

-.5 

.5 

.5 

.5 

.5 

1 

.5 

-.5 

.5 

-.5 

1 

-.5 

•  5 

.5 

.5 

.5 

1 

.5 

L 

-.5 

.5 

-.5 

1 

TABLE  2.8  Power  Study  Table  (Cont) 
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Observed  frequencies  of  Che  nuaber  of  beeas  of  length  X  and 
breadth  T  aeasured  in  niliaeters. 


T 

17 

16.5 

16 

15.5 

15 

14. 

5  14 

13.5 

X 

13 

12.5 

12 

11. 5 

11 

10.5 

10 

9.5 

TOTALS 

9.125 

E 

2 

0 

0 

3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

5 

8.875 

9 

8 

17 

19 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

48 

8.625 

2 

23 

101 

156 

93 

23 

2 

0 

0 

0 

0 

0 

0 

0 

0 

0 

400 

8.375 

0 

18 

105 

494 

574 

227 

56 

9 

0 

0 

0 

0 

0 

0 

0 

0 

1483 

8.125 

0 

4 

44 

375 

956 

913 

362 

73 

• 

12 

3 

0 

0 

0 

0 

0 

0 

2742 

7.875 

0 

0 

7 

81 

385 

871 

794. 

330 

89 

19 

3 

0 

0 

0 

0 

0 

2579 

7.625 

0 

0 

1 

4 

65 

236 

469 

361 

175 

55 

27 

4 

0 

0 

0 

0 

1397 

7.375 

0 

0 

0 

0 

6 

.  23 

91 

137 

124 

78 

37 

22 

11 

0 

1 

0 

530 

7.125 

0 

0 

0 

0 

0 

1 

13 

18 

28 

35 

25 

32 

11 

6 

l 

0 

170 

6.875 

0 

0 

0 

0 

0 

0 

0 

1 

9 

8 

21 

12 

13 

7 

1 

0 

72 

6.625 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

2 

0 

l 

4 

3 

0 

10 

6.375 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

1 

1 

1 

4 

TOTALS 

6 

55 

275 

1129  2082 

2294 

1787 

929 

437 

199 

115 

70 

36 

18 

7 

1 

9440 

TABLE  2.9  Johennsen's  Bean  Decs 
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Observed  frequencies  of  Mullerian  glands  on  Che  righc  (X) 
and  left  (Y)  forelegs  of  2,000  male  pigs. 


25 


Y 

0 

1 

2 

3 

4 

X 

5 

6 

7 

8 

9 

10 

TOTALS 

8 

4 

2 

0 

0 

0 

0 

0 

0 

0 

14 

0 

5 

151 

65 

14 

5 

l 

0 

0 

0 

0 

241 

l 

1 

2 

58 

154 

88 

27 

7 

0 

0 

0 

0 

336 

2 

0 

9 

96 

173 

119 

24 

8 

1 

0 

0 

430 

3 

0 

3 

28 

128 

153 

92 

16 

8 

1 

0 

429 

4 

0 

0 

7 

28 

77 

101 

58 

20 

3 

1 

0 

295 

5 

0 

0 

1 

6 

26 

52 

48 

13 

5 

3 

159 

6 

0 

0 

0 

0 

3 

11 

16 

17 

3 

3 

53 

7 

0 

0 

0 

0 

1 

9 

7 

9 

2 

2 

0 

30 

8 

0 

0 

0 

0 

0 

0 

0 

5 

2 

2 

D 

10 

9 

0 

0 

0 

0 

0 

0 

2 

0 

0 

1 

B 

3 

10 

15 

225 

353 

437 

411 

297 

155 

78 

16 

12 

■ 

2000 

TOTALS 

30 

450 

706 

874. 

822 

594 

310 

156 

32 

24 

2 

4000 

TABLE  2.10  Pig  Data 
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61  points  taken  from  a  paraboloid  with  added  spherical  guassian  noise 


x 

Y 

Z 

X 

Y 

Z 

-2.732 

6.557 

25.507 

-3.452 

2.948 

25.591 

-5.264 

5.253 

24.200 

-7.261 

6.959 

26.789 

-5.103 

5.986 

26.446 

-2.370 

3.617 

25.510 

-3.335 

5.888 

23.947 

-4.181 

4.530 

29.118 

-5.420 

5.607 

25.321 

-2.360 

3.916 

24.879 

-3.261 

7.697 

27.479 

-5.297 

5.802 

29.073 

-4.607 

6.651 

26.518 

-1.585 

2.524 

26.954 

-4.236 

4.220 

24.416 

-3.267 

4.402 

28.899 

-4.947 

5.363 

26.918 

-1.187 

3.257 

26.100 

-2.189 

5.881 

26.282 

-2.095 

6.931 

27.269 

-2.193 

5.953 

26.962 

-4.800 

3.339 

27.011 

-4.838 

5.909 

25.196 

-5.602 

5.322 

28.759 

-3.448 

5.610 

27.489 

-1.478 

1.644 

26.057 

-0.990 

5.391 

25.667 

-5.151 

4.481 

27.583 

-6.116 

6.326 

30.189 

-0.694 

3.408 

24.997 

-2.175 

4.645 

25.613 

-5.687 

4.766 

29.640 

-5.849 

6.876 

26.070 

-1.733 

3.932 

26.198 

0.162 

5.521 

25.027 

-6.154 

4.932 

29.631 

-5.360 

5.494 

28.675 

-3-823 

3.784 

25.123 

-1.740 

4.070 

27.311 

-2.588 

4.923 

28.343 

-2.975 

6.716 

27.999 

-3.237 

3.648 

26.249 

-4.220 

3.853 

26.396 

-5.740 

4.537 

30.277 

-6.306 

4.573 

25.715 

-0.709 

1.542 

27.240 

-1.972 

5.615 

24.900 

-6.568 

5.335 

29.631 

-4.497 

5.314 

27.978 

-1.669 

1.501 

25.413 

-2.005 

3.352 

24.599 

-7.690 

4.578 

30.363 

-3.809 

5.421 

28.794 

0.837 

1.271 

25.303 

-2.081 

3.795 

25.542 

-5.832 

7.020 

28.915 

-4.907 

7.120 

27.449 

.  -0.405 

3.669 

27.587 

-0.742 

2.800 

26.394 

-3.019 

3.752 

29.665 

-2.750 

2.233 

27.669 

TABLE  2.11  Gnanedesikan  Data  Set 


825 


27 


Iris 

Setosa 

Iris  Versicolor 

Iris  Virginica 

Sepal H  SepalTu  Petal  ■  Petal 
Length!  Width!  Length®  Width 

Sepal  asepaiaPetal 
LeagthB  Width  9  Lengh 

BPetal 

tawidth 

Sepal  BSepaiapecal  BPetal 
Lengths  Width 8  Length  1  Width 

5.1 

«  3.5 

T 

rr 

T 

6.2 

7.0 

ll  3.2 

1 

4.7 

a 

1.4 

6.3 

a  3.3 

a 

6.0 

8 

2.5 

4.9 

I  3.0 

I 

1.4 

1 

0.2 

6.4 

a  3.2 

1 

4.5 

a 

1.5 

5.8 

8  2.7 

a 

5.1 

8 

1.9 

4.7 

«  3.2 

1 

1.3 

a 

0.2 

6.9 

9  3.1 

a 

4.9 

a 

1.5 

7.1 

B  3.0 

a 

5.9 

8 

2.1 

4.6 

»  3.1 

a 

1.5 

a 

0.2 

5.5 

a  2.3 

i 

4.0 

a 

1.3 

6.3 

a  2.9 

u 

5.6 

8 

1.8 

5.0 

»  3.6 

a 

1.4 

a 

0.2 

6.5 

8  2.8 

8 

4.6. 

8 

1.5 

6.5 

a  3.0 

a 

5.8 

U 

2.2 

5.4 

1  3.9 

a 

1.7 

a 

0.4 

5.7 

8  2.8 

a 

4.5 

9 

1.3 

7.6 

a  3.0 

n 

6.6 

8 

2.1 

4.6 

1  3.4 

a 

1.4 

a 

0.3 

6.3 

B  3.3 

a 

4.7 

8 

1.6 

4.9 

a  2.5 

8 

4.5 

a 

1.7 

5.0 

8  3.4 

a 

1.5 

a 

0.2 

4.9 

a  2.4 

a 

3.3 

8 

1.0 

7.3 

a  2.9 

8 

6.3 

8 

1.8 

4.4 

II  2.9 

a 

1.4 

a 

0.2 

6.6 

8  2.9 

a 

4.6 

8 

1.3 

6.7 

a  2.5 

3 

5.8 

8 

1.8 

4.9 

1!  3.1 

a 

1.5 

a 

0.1 

5.2 

8  2.7 

a 

3-9 

8 

1.4 

7.2 

a  3.6 

II 

6.1 

II 

2.5 

5.4 

1  3.7 

a 

1.5 

a 

0.2 

5.0 

a  2.0 

a 

3.5 

a 

1.0 

6.5 

a  3.2 

a 

5.1 

a 

2.0 

4.8 

8  3.4 

a 

1.6 

a 

0.2 

5.9 

a  3.0 

a 

4.2 

a 

1.5 

6.4 

ll  2.7 

8 

5.3 

ii 

1.9 

4.8 

II  3.0 

a 

1.4 

a 

0.1 

6.0 

a  2.2 

a 

4.0 

a 

1.0 

6.8 

a  3.0 

a 

5.5 

a 

2.1 

4.3 

8  3.0 

a 

l.l 

a 

0.1 

6.1 

a  2.9 

a 

4.7 

n 

1.4 

5-7 

II  2.5 

a 

5.0 

8 

2.0 

5.8 

8  4.0 

a 

1.2 

a 

0.2 

5.6 

II  2.9 

a 

3.6 

a 

1.3 

5.8 

a  2.8 

a 

5.1 

II 

2.4 

5.7 

a  4.4 

a 

1.5 

a 

0.4 

6.7 

a  3.1 

a 

4.4 

ii 

1.4 

6.4 

H  3.2 

a 

5.3 

II 

2.3 

5.4 

a  3.9 

a 

1.3 

a 

0.4 

5.6 

II  3.0 

a 

4.5 

a 

1.5 

6.5 

ll  3.0 

a 

5.5 

II 

1.8 

5.1 

a  3.5 

a 

1.4 

a 

0.3 

5.8 

II  2.7 

a 

4.1 

a 

1.0 

7-7 

a  3.3 

a 

6.7 

II 

2.2 

5.7 

ll  3.8 

ii 

1.7 

a 

0.3 

6.2 

a  2.2 

a 

4.5 

1.5 

7.7 

a  2.5 

a 

6.9 

a 

2.3 

5.1 

a  3.8 

a 

1.5 

ii 

0.3 

5.6 

ll  2.5 

a 

3.9 

a 

1.1 

6.0 

■I  2.2 

a 

5.0 

,i 

1.5 

5.4 

a  3.4 

a 

1.7 

a 

0.2 

5.9 

II  3.2 

a 

4.8 

a 

1.8 

6.9 

a  3.2 

a 

5.7 

a 

2.3 

5.1 

II  3.7 

a 

1.5 

a 

0.4 

6.1 

9  2.8 

a 

4.0 

a 

1.3 

5.6 

a  2.3 

a 

4.9 

a 

2.0 

4.6 

a  3-6 

a 

1.0 

a 

0.2 

6.3 

a  2.5 

a 

4.9 

a 

1.5 

7.7 

ii  2.8 

a 

6.7 

8 

2.0 

5.1 

a  3.3 

a 

1.7 

a 

0.5 

6.1 

9  2.8 

a 

4.7 

a 

1.2 

6.3 

'I  2.7 

a 

4.9 

a 

1.8 

4.8 

II  3.4 

a 

1.9 

a 

0.2 

6.4 

II  2.9 

a 

4.3 

a 

1.3 

6 . 7 

a  3.3 

a 

5.7 

a 

2.1 

5.0 

a  3.0 

a 

1.6 

a 

0.2 

6.6 

ll  3.0 

a 

4.4 

a 

1.4 

7.2 

a  3.2 

a 

6.0 

a 

1.8 

5.0 

II  3.4 

a 

1.6 

a 

0.4 

6.8 

II  2.8 

a 

4.8 

a 

1.4 

6.2 

a  2.3 

a 

4.8 

a 

1.8 

5.2 

ll  3.5 

a 

1.5 

a 

0.2 

6.7 

9  3.0 

a 

5.0 

a 

1.7 

6.1 

II  3.0 

a 

4.9 

8 

1.8 

5.2 

a  3.4 

a 

1.4 

a 

0.2 

6.0 

II  2.9 

a 

4.5 

a 

1.5 

6.4 

a  2.8 

a 

5.6 

a 

2.1 

4.7 

a  3.2 

a 

1.6 

a 

0.2 

5-7 

ll  2.6 

a 

3.5 

a 

1.0 

7.2 

a  3.0 

a 

5.S 

a 

1.6 

4.3 

II  3.1 

a 

1.6 

a 

0.2 

5-5 

ll  2.4 

a 

3.8 

a 

1.1 

7.4 

a  2.3 

a 

6.1 

a 

1.9 

5.4 

,1  3.4 

a 

1.5 

a 

0.4 

5  •  5 

9  2.4 

a 

3.7 

a 

1.0 

7.9 

1  3.3 

a 

6.4 

( 

t 

2.0 

5.2 

ll  4.1 

a 

1.5 

ii 

0.1 

5.8 

II  2.7 

i 

3-9 

a 

1.2 

6 .4 

a  2.3 

a 

5.6 

a 

2.2 

5-5 

a  4.2 

a 

1.4 

a 

0.2 

6.0 

8  2. -7 

a 

5.1 

a 

1.6 

6.3 

a  2.3 

a 

5.1 

a 

1.5 

4.9 

a  3.1 

a 

1.5 

a 

0.2 

5.4 

II  3.0 

a 

4.5 

a 

1.5 

6.1 

a  2.6 

a 

5.6 

a 

1.4 

5.0 

ll  3.2 

a 

1.2 

a 

0.2 

6.0 

II  3.4 

a 

4.5 

a 

1.6 

7.7 

a  3.0 

a 

6.1 

a 

2.3 

5.5 

a  3.5 

a 

1.3 

a 

0.2 

6.7 

9  3.1 

a 

4.7 

a 

1.5 

6.3 

a  3.4 

a 

5.6 

a 

2.4 

4.9 

II  3.6 

a 

1.4 

a 

0.1 

6.3 

a  2.3 

a 

4.4 

a 

1.3 

6 .4 

a  3.1 

a 

5.5 

3 

1.8 

4.4 

a  3.0 

a 

1.3 

a 

0.2 

5.6 

a  3.0 

a 

4.1 

a 

1.3 

6.0 

a  3.0 

a 

4.3 

;t 

1.8 

5.1 

Jl  3.4 

a 

1 . 5 

a 

0.2 

5  •  5 

8  2.5 

a 

4.0 

a 

1.3 

6.9 

a  3.1 

a 

5.4 

1 

2.1 

5.0 

a  3.5 

a 

1.3 

a 

0.3 

5.5 

ll  2.6 

a 

4.4 

a 

1.2 

6.7 

a  3.1 

a 

5.6 

il 

2.4 

4.5 

II  2.3 

a 

1.3 

a 

0.3 

6.1 

a  3.0 

a 

4.6 

a 

1.4 

6.9 

a  3.1 

a 

5.1 

a 

2.3 

4.4 

9  3.2 

a 

1.3 

ii 

0.2 

5.8 

II  2.6 

a 

4.0 

a 

1.2 

5-8 

a  2.7 

a 

5.1 

8 

1.9 

5.0 

8  3.5 

a 

1.6 

n 

0.6 

5.0 

ll  2.3 

a 

3.3 

a 

1.0 

6.8 

a  3.2 

a 

5.9 

II 

2.3 

5.1 

ll  3.8 

a 

1.9 

a 

0.4 

5.6 

II  2.7 

a 

4.2 

a 

1.3 

6.7 

a  3.3 

a 

5.7 

ll 

2.5 

4.8 

8  3.0 

a 

1.4 

a 

0.3 

5.7 

9  3.0 

a 

4.2 

a 

1.2 

6.7 

a  3.0 

a 

5.2 

II 

2.3 

5.1 

a  3.8 

a 

1.6 

a 

0.2 

5.7 

a  2.9 

a 

4.2 

a 

1.3 

6.3 

a  2.5 

a 

5.0 

II 

1.9 

4.6 

9  3.2 

a 

1.4 

a 

0.2 

6.2 

a  2.9 

a 

4.3 

a 

1.3 

6.5 

a  3.0 

a 

5.2 

II 

2.0 

5.3 

a  3.7 

a 

1.5 

a 

0.2 

5.1 

8  2.5 

a 

3.0 

a 

1.1 

6.2 

a  3.4 

ii 

5.4 

a 

2.3 

5.0 

a  3.3 

a 

1.4 

a 

0.2 

5.7 

a  2.8 

a 

4.1 

a 

1.3 

5.9 

a  3.0 

a 

5.1 

a 

1.8 

TABLE  2.12  Iris  Data 


826 


28 


a.  Gnanedesikan  Data  Set 


-  u»~ 

p-value 

max  w2 

P  n 

p-value 

to2 

n*p 

p-value 

margin  X 

.083 

>•15 

— 

Y 

.051 

>.15 

— 

— 

- - 

--- 

2 

.109 

.08 

— 

— 

— 

--- 

trivariate 

.024 

>.15 

.136 

.11 

•  065 

>.15 

b.  Iri3  Data 

setosa  margin  SL 

.072 

>.15 

— 

— 

--- 

_ - 

SW 

.075 

>.15 

— 

— 

--- 

— 

PL 

.190 

.07 

— 

— 

— 

— 

PW 

.977 

0 

— 

— 

— 

— 

quadri variate 

.008 

>.15 

.124 

>.15 

.059 

>.15 

versicolor  margin  SL 

.057 

>.15 

— 

— 

— 

_ - 

SW 

.103 

.10 

— 

— 

— 

_ 

PL 

.010 

>.15 

— 

— 

— 

— 

PW 

.152 

.02 

— 

— 

— 

quadri variate 

.016 

.01 

.191 

.03 

-081 

>.15 

virginica  margin  SL 

.089 

>.15 

— 

— 

-- - 

SW 

.108 

.08 

— 

— 

— 

PL 

.086 

>.15 

— 

— 

— 

— 

PW 

.118 

.06 

— 

— 

— 

— 

quadrivariate 

.006 

>.15 

.060 

>.15 

.050 

> .  15 

all  iris  margin  SL 

.127 

.05 

— 

— 

— 

... 

SW 

.181 

.01 

— 

— 

— 

- 

PL 

1.222 

0 

— 

— 

— 

-  — 

PW 

.722 

0 

— 

— 

— 

— 

quadrivariate 

.019 

0 

.210 

.02 

.088 

>•15 

versicolor  margin  SL 

.066 

>.15 

— 

- — 

--- 

plus  SW 

.158 

.02 

— 

— 

— 

— 

virginica  PL 

.047 

>.15 

— 

— 

— 

-  -  - 

PW 

.243 

0 

— 

— 

— 

— 

quadrivariate 

.018 

0 

.198 

.02 

.078 

>.15 

TABLE  2 . L 3 •  Behavior  of  uj2 

n  . 

,  max  ui2 

n  n  n 

and  w2 

nX  n 

in  Examp 1 

es  3  and 

4 
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Gnanedesikan  Data  Set 


Statistic 

Value 

w2 

0.024 

n,p 

max  m2 

0.14 

P  Q 

b. 

1.19 

lP 

b2 

2P 

12.15 

SW 

0.98 

AD 

1.46 

Iris  Data 

setosa 


versicolor 


virginica 


w2 

n,p 

0.008 

max  ui2 

P  a 

0.124 

% 

3.08 

% 

0.97 

SW 

1.08 

AD 

0.17 

a)2 

n,p 

0.16 

max  w2 

P  a 

0.191 

b. 

lP 

2.91 

cr 

TJ 

22.60 

SW 

0.95 

AD 

0.36 

Hi2 

n,p 

0.006 

max  u2 

P  n 

0.060 

S 

3.15 

% 

24.30 

SW 

0.94 

AD 

0.42 

Table  2.14.  Behavior  of  w2  ,  max  w2  , 

p,n  p  n  p*n 

Statisics  in  Examples  3  and  4 


p-value 

>.15 

.11 

>.15 

>.15 

.02 

.07 

>.15 

>.15 

.12 

.03 

>.15 

.08 

.01 

.03 

>.15 

>.15 

>.15 

>.15 

>.15 

>.15 

.11 

>.15 


>.15 

>.15 

and  Competitive 


3mvA  ivoiiiao 


FIGURE  2.1 


Comparison  of  Critical  Values;  n  »  8,..., 12;  p  *  1 
a  -  .10, .05, .01 


•  •  • 


*TAT!«TIO  (  m  1«**  > 


a.t 

a.i* 


fTATt«T»8  (  If  —  ) 


mnwvio  (  *  ia-  5 

FIGURE  2.2  Frequency  Graph  of  “£oo,p»  p  *  1,2,3 
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ABSTRACT 

This  paper  deals  with  numerical  solution  formulations  in 
conjunction  with  a  generalized  harmonic  balance  method,  and, 
computational  results  of  several  specific  examples  in  forced 
nonlinear  vibrations.  In  a  previous  paper,  approximate  equations 
were  derived  using  this  harmonic  balance  method.  Main  results 
obtained  in  that  earlier  paper  will  be  summarized  here.  An 
efficient  formulation  for  numerical  solutions  is  then  described. 
The  initial  conditions  needed  in  the  generalized  harmonic  balance 
method  can  be  derived  from  given  initial  conditions  and  such  a 
relation  is  also  derived  here.  Finally,  several  specific  examples 
have  been  worked  out.  The  numerical  results  include  phase 
diagrams,  evolution  of  various  harmonics  and  comparisons  between 
the  present  harmonic  balance  solutions  and  those  obtained  by 
integrating  the  original  differential  equation.  Although  only 
subharmonic  cases  are  treated  in  the  present  paper,  the 
formulation  should  apply  also  to  superharmonic  solutions. 
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1. 


INTRODUCTION 


This  paper  is  a  sequel  to  one  published  in  the  Proceedings 
of  the  1988  Army  Conference  on  Applied  Mathematics  and 
Computinqt 1 1 .  In  that  paper,  equations  were  derived  using  a 
generalized  harmonic  balance  method  (G.H.B.)  for  problems 
associated  with  forced  nonlinear  oscillations  where  the 
nonlinearity  has  a  polynomial  form  with  depedence  on  a  small 
parameter  e.  The  method  of  harmonic  balance  proceeds  by 
substituting  a  series  of  periodic  functions  into  the  given 
equation  and  then  equate  appropriate  coefficients  of  same 
harmonics  to  zero.  This  approach  may  lead  to  erroneous  results  if 
carried  out  simply  in  a  staightforward  fashion,  as  noted  by  Nayfeh 
(see  reference  [2],  for  example).  Unfortunately,  the  main 
alternative  method,  namely,  multiple  scaling,  involves 
considerably  laborous  algebra,  elimination  of  secular  terms  and 
solutions  of  differential  equations  at  intermediate  steps,  and 
then  reconstitutions  of  the  multiple  scaling  results  to  obtain  the 
equation  governing  the  evolution  of  the  amplitude  and  phase 
relation  of  the  oscillation  problem.  We  have  shown  in  [1], 
however,  that,  by  using  only  the  simple  part  of  the  multiple 
scaling  method  to  give  the  form  of  the  solution,  and  then,  using  a 
generalized  harmonic  balance  method,  we  can  obtain  the  desired 
end-equation  directly,  avoiding  much  of  the  laborous  algebra 
involved  in  multiple  scaling,  e.g.,  solving  the  intermediate 
differential  equations  and  reconstitution. 

The  present  paper  deals  with  the  numerical  solution  of  the 
equations  derived  in  II].  In  Section  2,  main  results  derived  in 
[1]  will  be  summarized.  An  efficient  formulation  for  numerical 
solutions  is  presented  in  Section  3.  The  initial  conditions  needed 
in  the  generalized  harmonic  balance  method  can  be  derived  from 
given  initial  conditions.  This  relation  is  derived  in  Section  4. 
Finally  in  Section  5,  several  specific  examples  have  been  worked 
out.  The  numerical  results  include  phase  diagrams,  evolution  of 
various  harmonics  and  comparisons  between  the  G.H.B.  solutions  and 
those  obtained  by  integrating  the  original  differential  equation. 
Although  only  subharmonic  cases  are  treated  in  the  present  paper, 
the  formulation  should  apply  also  to  superharmonic  solutions. 


2.  A  BRIEF  SUMMARY  OF  PREVIOUS  RESULTS 


Some  of  the  key  equations  and  results  from  the  previous 
paper! 1]  are  given  here  for  easy  reference.  The  nonlinear  ordinary 
differential  equation  of  interest  is 

dz  u/dtJ  +u+2  tu (  du/dt )  +  ca2  u2  +e2  a,  u3  +ca,  ( du/dt ) 2 

+  e:  a,  u(du/dt ) :  «2f  cos  (  2t )  (1) 
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where  u(t)  is  the  unknow  function  fj  and  ak  ,  k-2,3,4  and  5,  are 
given  constants,  c  is  the  small  parameter  mentioned  earlier;  and, 
f  and  2  pertain  to  the  magnitude  and  frequency  of  the  forcing 
function.  This  equation  has  been  treated  previously  by  Nayfeh 
using  the  method  of  multiple  scales[3,4]. 

In  the  subharmonic  case, 

2-2+e<T  (2) 

where  a  is  a  given  detuning  parameter. 

It  was  shown  in  [1]  that  the  solution  of  u(t)  in  (1)  can  be 
written  in  the  following  form, 

u  -  eU0+[  (UXA+U2A2  )  +  e(U3A3+U4A4  )+c.c.  ]  (3) 

where  the  terms  of  order  higher  than  one  in  e  have  been  neglected, 

A  -  eAt  (4) 

and  c.c.  stands  for  the  complex  conjugate;  also,  Uk  ,  k-0,l,..,4, 
are  slowly  varying  functions  compared  with  A  of  (4)  in  the  sense 
that,  while  dA/dt  is  of  order  unity,  dUk/dt  is  of  0(e). 

The  equations  needed  to  obtain  Uk ' s  were  derived  in  [1]  and 
they  are, 


2i(dU1/dt+e//U1  )-(  2/3  )  e(  a2  +2<x4  )  f  SU3 
+  e2  {  (  -fJ2  +(  2/9  )  f 2  (  3 a3  +  4 a,  )-(l/18)f2  (  5oj  2 +12a2  a4 -12<x4  2  )  ]U3 
+  ( 1/3  )  (  9a3  +3a5  -10  a2  2  -10  04  a4  -4a4  2  )U3  2  U3 
-  (  4/9  )  i/i  (  2a2  +a4  )  f  SUX  +( 1/9  )  <r(  lla,  +16a4  )  f  SUX  }-0  (5) 

U2— (l/3)fS+(l/3)el  (  a2-a4)Ut  2-(4/3)  (i/U-a)fS]  (6) 

U0— 2(a2+a4  )U1U1-2(a2+4a4  )U2U2  (7) 

U3  -( 1/4  )  (  a2  -2a4  )UX  U2  (8) 

U4-(l/15)(a2-4a4  )U22  (9) 

where  a  bar  above  a  variable  denotes  its  complex  conjugate,  and, 

S  -  eitfft  (10) 
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3.  SOLUTION  FORMULATION  FOR  THE  GENERALIZED 
HARMONIC  BALACE  METHOD 


We  first  simplify  (5)  by  introducing  the  following  constants, 


cx  -  -2f  (  a2  +2a4  )/3 

c2  «  2f 2  (  3a3 +4a5  )/9 

-f2  (  5a2  2  +12aj  a4  -12a4  2  )/18 

C3  «■  (  9a3  +3as  -10a2  2 -10a2  a4  -4a4  2  )/3 
c4  -  -4^f  (  2a2 +a4  )/9 
c5  ■  fff  ( lla2 +16a4  )/9 


(ID 


Equation  (5)  can  now  be  written  as 

2i(dUx/dt  +  )  +  sCiSt^ 

+  e2[c2Ux  +  CjU!2^  +  (ic4  +  Cj  ) SUX  ]  -  0 


(12) 


It  is  observed  that  (12)  is  an  unsatisfactory  form  for  numerical 
work  since  the  variable  S  in  the  equation,  defined  in  (10),  is 
time-dependent.  Due  to  the  fact  that  the  form  of  the  equation 

involve  only  terms  of  the  form  Ux  ,  SUX  •  and  UX2UX  linearly,  we  can 
convert  (12)  into  a  differential  equation  with  constant 
coefficient  by  setting 


Then 


Uk-VkS^2 


dUk  /dt«  ( dVk  /dt+ik  s  <rVk  /2  )  Sk ' 2 
and  (12)  becomes 

2i(dVx/dt  +  isffVl/2  +  e*/Vx  )  +  ecxSVx 
+  e2tc2Vx  +  c3V12Vl  +  ( ic4  +  c5)SVx]  -  0 


In  terms  of  Vk ,  equation  (6)-(9)  become 


Vj  —  ( 1/3  )  f  S+  (  1/3  )  e  ((  a2-a4  )  V.  2  -  (  4/3  )  (  ifi-a )  f  S  ) 


(13) 

(14) 


(15) 


(16) 
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V0  ■— 2  (  a2  +a4  )V1V1-2(a2+4a4  )V2V2  (17) 

V3-(l/4)(«j-2a4  )VXV2  (18) 

V4  •  ( 1/1 5  )  (  a2 -4  a4  )V22  (19) 

Also  equation  (3)  becomes 

u ( t )  -  eV0  +{  ( Vx  B+V2  B2  )  +e  (  v3  b3  +v4  b4  )+c .  c .  ]  (20) 

where 

B-e1 * «cos*+isin*,  4-(l+ea/2)t  (21) 


It  will  be  convenient  to  use  real  functions  to  carry  out 
computations.  To  do  this,  introduce 

V*-VkR+iVkI  (22) 

Equations  (15)— (19)  become 

2dV1R/dt  -  e(-2</VlR  +  U+cx  )VXI  ] 

+  eM-c4v1R-(cJ-c5)V1I-c3(VlR4v1IMv1I) 

,  23) 

2dVlx/dt  -  e(-2//Vll-(a-c1  )V1R  ] 

+  e*  [+c4  v1I+(c2+c5  )v1H+c3  (V1R2+v1I2  )V1R  ] 

3V2r  —  f+e[-4(aV2R+^/V2r  )  +  U2-«4  )(VlR2-Vlx2  )  ) 

(24) 

3V2i«  +e(-4(  <xV2I-*/V2R  )  +  2(aj-a4  )(V1RVlx  )  ) 

V0— 2(a2+«4  )(V1R2+V1X2  )-2  (  a2  +4a4  )(V2R2+V2I2  )  (25) 

VJR-(a2-2a4  )(V1RVJR-VlxV2I  )/4 

(26) 

VJX-(a2-2a4  )(V1RV2x+VlxVJR)/4 
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(27) 


V4R-(«2-4«4  )<V2R2-V2I2  >/15 

V4z-2(«2-4a4  )V2RV2I/15 


The  procedure  now  is  to  solve  (23).  Then  substitute  uhe 
resulting  V1R,  Vz z  in  (24),  using  the  fact  that  on  the  right  hand 
side  we  need  only  a  zero  order  approximation  to  V2R  and  V2 z , 
namely  V2R  —  f/3 ,  V2I-0.  This  gives  a  first  order  approximation  for 
V2K,  V2I.  These  values  can  then  be  substituted  in  ( 25 ) — ( 27 )  to 
give  V0  ,  V3R,  V3I,  V4R  and  V4I. 

This  procedure  requires  values  for  V1R,  Vz z  at  t-0  as  initial 
conditions  to  start  the  numerical  integration  of  (23).  This 
process  is  considered  in  the  next  section. 

To  recover  the  solution  u(t)  of  the  original  equation  (1), 
we  substitute  (21),  (22)  in  (20)  to  obtain 


u  ( t )  -  eV0+2[V1Rcos(  «)-vlzsin(  *)+V2Rcos(2*)-V2  x  sin(  2*)  ] 

+  2e[V3Rcos( 3*)-V3 z sin( 3*)+V4Rcos( 4*)-V4 z sin( 44) ]  (28) 


v(t)  -  du(t)/dt 

■  -2{  (  1+e<t/2  )  [  Vx  R  sin(  t)-Vlzcos(  #)] 

+  2 ( 1+e a/2 ) [ V2  R  sin ( 2  4 ) -V2  z  cos ( 2 i ) ] 

+  3e( V3 R sin( 3* )-V3 z cos ( 3# ) 1 
+  4e[ V4 R sin( 4#)-v4 z cos( 4*) ] } 

+2 ( ( dVz R/dt ) cos (  * ) -( dvz z /dt ) sin(  $)J  (29) 


4.  INITIAL  CONDITIONS  FOR  Vz R ( t ) ,  Vlz(t) 


In  this  section,  the  symbols  V0 ,  V1R,  Vz z , . . . . ,  etc.  will 
refer  to  the  values  of  these  quantities  at  t-0.  The  initial 
conditions  for  the  original  equation  (1)  are  given  as 


u ( 0)-u„ ,  v ( 0)»v0 


(  30) 


At  t-0,  4-0  in  (28),  (29).  On  solving  the  resulting  equations  for 
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we  obtain 


1  R 


'ii  » 


vi  r  *  ( uo  “2V2  r  -2  e  ( V0  /2+Vj  r  +V4  r  )  ] 
vn-{-[v0+4cV2<J+2e(3V3I+4V4I  >  ]+dV1R/dt}/c 

where  c-l+e<x/2.  We  then  use  the  following  iterative 


(31) 


procedure : 


STEP  1: 
side  of  (24) 


Drop  terms  of  first  order  in  e  in  the  right  hand 
and  (31)  and  obtain 


-f/3. 

V2I-0. 

u0/2+2f/3, 

Vu-v»/2 

STEP  2: 
and  dropping 


Using  these  values  in  the  right  hand  side  of  (23) 
terms  of  second  order  in  e  gives 


dV1R/dt-e[-2//(u0/2+2f/3)-v0  (  u+c)  ] 


STEP  3 :  Using  ( 25)  —  <  27) ,  calculate  V0  ,  V3  and  v4  with  V3  and 
V2  obtained  in  STEP  1 . 


STEP  4 :  Subsitute  in  the  right  hand  side  of  (24)  the  values 
of  V3  and  V2  obtained  in  STEP  1  to  obtain  a  new  value  of  V2  which 
now  is  of  first  order  in  e. 


STEP  5 :  Finally,  in  (31),  substitute  V2  obtained  in  STEP  4, 
V0  ,  V3  and  V4  obtained  in  STEP  3,  and  dv1(/dt  obtained  in  STEP  2  to 
calculate  the  new  initial  value  of  V1 ,  which  is  now  of  first  order 
in  e . 


Obviously  the  above  method  for  obtaining  initial  conditions 
for  VlR(t),  V:i(t),  correct  to  order  e,  is  not  unique.  The 
question  of  obtaining  the  "best"  choice  of  initial  conditions 
requires  further  investigation. 
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5.  NUMERICAL  EXAMPLES 


Following  the  procedure  described  in  Section4  3  and  4, 
several  examples  have  been  worked  out.  The  three  sets  of 
parameters  selected  are: 


DATA  SET  I: 


a2  *0 . 1 ; 

e-0 . 1 ; 

a3  -0 . 1  ; 

/J*  0 . 1 ; 

a,  -0 . 1 ; 

f-1.0; 

a5  — 0 . 1 ; 

(7-1 . 0 

DATA 

SET 

II: 

a2  -  1 . 0  ; 

e-0.01; 

a3  -0 . 5  ; 

Um 0 . 5 ; 

a4  -1 . 0  ; 

f-1.0; 

a5  -0 . 5  ; 

(7-1.0 

DATA 

SET 

III: 

a2  -1 . 0  ; 

e-0 . 1 ; 

a3  *0.5; 

>JmQ .  5 ; 

a4  -1 . 0  ; 

f-1.0; 

a5  »  0 . 5  ; 

(7-1 . 0 

The  initial  conditions  for  all  the  examples  are  taken  as 
u ( 0 ) -u0 ;  ( du/dt ) t , 0 -v0 


Note  that  the  only  difference  between  DATA  SET  II  and  DATA  SET  III 
is  in  the  value  of  e  which  is  0.01  for  SET  II  but  is  increased  to 
0.1  for  SET  III. 

For  DATA  SET  I,  Figure  1(a)  is  the  evolution  curve  for  v0 , 
which  varies  from  its  minimum  value  of  -0.444  and  settles  down  to 
a  constant  of  about  -0.083  for  large  t  (greater  than  t-500.,  say). 
Since  V0  is  real,  the  phase  angle  is  always  zero.  Figure  i(b-l) 
shows  the  magnitude  ( |  of  V1  (which  is  complex,  as  are  all  other 
Vk ,  k«2,3  and  4)  and  it  varies  from  a  maximun  value  of  0.949  and 
diminishes  to  about  0.003  at  t-600.  Since  VL  represents  a 
subharmonic  motion  to  the  problem,  the  fact  that  | Vx  (  diminshes  to 
zero  for  large  t  indicates  that  there  is  no  subharmonic  vibrations 
in  the  steady  state  solution  of  the  problem.  Figure  I(b-2)  shows 
the  phase  angle  Si  of  V3  and  it  varies  almost  linearly  with 
respect  of  time.  The  discontinuities  simply  reflect  the  fact  that 
changes  sign,  from  -a  to  n,  at  those  preselected  angles  so  that 
remains  within  the  range  of  (-k,k).  Figures  I(c-1,2)  show  the 
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magnitude  | V3  |  and  the  phase  angle  03  of  V3  respectively.  It  is 
noted  that  |V3|  is  much  smaller  than  |VX|,  with  |V3|Bax  »  0.007, 
but,  03  behaves  very  much  in  the  same  way  as  (3X  .  The  magnitudes 
and  phase  angles  of  V2  and  V4  turn  out  to  be  constants,  as  can  be 
observed  also  from  equations  (24)  and  (27)  in  Section  4,  with 


V2  1  -  0.289, 

@2  -  -3.126 

rads . 

V4  |  -  0.002, 

@4  -  -3.111 

rads . 

Figures  I(d-1,2)  show  phase  diagrams,  i.e.,  v(t)  vs.  u(t)  as  the 
parameter  t  varies  from  0  to  a  value  sufficiently  large  so  that  a 
steady  state  has  almost  been  reached.  In  the  case  of  Data  Set  I, 
this  corresponds  to  a  time  t  of  about  600.  The  left  hand  side, 
Figure  I(d-l),  is  the  result  by  a  reconstitution,  using  equations 
(28)  and  (29),  from  the  V* ' s  as  given  above,  and,  on  the  right 
hand  side,  Figure  I(d-2),  by  integrating  directly  the  equation 
(1).  Figures  I(e-1,2)  show  time  evolutions  of  u(t)  for  a 
relatively  short  period  from  t-0  to  t-100.  Again,  on  the  left  hand 
side,  Figure  I(e-l)  shows  the  result  by  G.H.B.  and  reconstitution; 
and  on  the  right  hand  side,  by  integrating  directly  the  original 
differential  equation.  A  comparison  between  Figures  I(d-l)  and 
I(d-2)  and  that  between  Figure  I(e-l)  and  Figure  I(e-2)  indicate 
excellent  agreement  of  results  by  using  G.H.B.  and  by  integrating 
the  original  differential  equation  directly. 

Similar  results  are  presented  in  figures  ll(a)-n(e)  and 
III(a)-III(e)  for  Data  Sets  II  and  III.  It  is  observed,  however, 
that  subharmonic  vibrations  exist  in  these  two  data  sets  as 
|V3  1-3.406  for  Set  II  and  |V3  1-0.818  and  for  Set  III  (Figures 
Il(b-l)  and  Ill(b-l)  respectively)  and  both  stay  constant  as  t 
becomes  quite  large. 

Comparing  Figure  Ill(d-l)  with  III(d-2),  also  Ili(e-l)  with 
III(e-2),  it  is  observed  that  the  difference  of  results  between 
the  two  methods  is  more  pronounced  for  Data  Set  III  than  for  the 
other  cases. 


ACKNOWLEDGEMENTS:  The  original  differential  equation  (1)  was 
solved  numerically  by  using  a  Runge-Katta-Fehlberg  scheme! 5).  The 
solution  curves  were  made  using  a  plotter  routine  in  the  Dynamical 
Software  package(6). 
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Optimized  Annulus-based  Point-in- Reg  ion  Inclusion  Testing  ford  Dimensions 


T.  M.  Cronin 

CECOM  Center  for  Signals  Warfare 
Warrenton,  VA  22186-5100 


ABSTRACT.  Previous  research  into  metrical  inclusion  testing  for  dosed  planar  boundaries  of  general 
complexity  resulted  in  a  data  structure  called  the  inner  annulus,  which  in  this  paper  is  shown  to  be  the 
union  of  the  set  of  unit  normal  vectors  which  point  into  the  interior  of  a  boundary  The  annulus 
approach  to  the  point-in-region  problem  circumvents  the  infinite  precision  requirement  of  the  winding 
number  approach,  and  also  avoids  the  counting  dilemma  which  has  plagued  a  general  implementation 
of  the  parity  algorithm.  The  inner  annulus,  together  with  the  boundary  itself,  serve  as  arguments  to  a 
function  which  compares  distance  from  a  query  point.  If  the  query  point  is  nearer  the  annulus,  the 
query  point  is  inside  the  boundary;  otherwise  it  is  outside.  Although  the  previous  research  presented 
the  annulus  as  a  declarative  data  structure,  the  resultant  memory  requirements  were  prohibitive  for 
asymptotic  boundaries.  This  paper  presents  an  optimized  algorithm  which  minimally  encodes  the 
inclusion  information  at  each  coordinate  of  a  boundary.  The  inclusion  information  is  independent  of  a 
query  point,  and  position-insensitive  to  boundary  translation.  A  preprocessing  algorithm  assures  that 
the  boundary  is  oriented  in  a  counterclockwise  fashion  (so  that  by  convention  the  interior  is  always  to 
the  left).  The  inward-pointing  unit  normal  vector  attached  to  a  boundary  element  may  be  computed 
during  preprocessing,  or  alternatively  computed  at  run  time  with  a  procedural  query.  If  preprocessed,  it 
is  shown  that  for  each  boundary  coordinate,  three  bits  are  necessary  and  sufficient  to  represent  the 
instruction  to  attach  the  vector;  it  is  suggested  that  these  three  bits  may  be  represented  with  an  opcode 
at  the  coordinate  itself.  Hence,  at  run  time,  when  the  closest  boundary  element  is  computed,  the 
opcode  inclusion  instruction  is  fetched  and  decoded  along  with  it  This  approach  achieves  a 
performance  of  0[log  nl  query  time  for  simultaneous  closest  point  testing  and  metrical  inclusion  testing, 
with  a  storage  requirement  of  0(n],  and  preprocessing  complexity  0(n  *  log  n].  In  the  planar  case,  the 
storage  constant  multiplier  of  n  is  a  negligible  1.09,  using  a  standard  word  size  of  32  bits.  The  technique 
is  shown  to  be  extensible  to  closed  three  dimensional  surfaces,  which  are  compositely  defined  as  stacks 
of  planar  boundaries.  The  problem  then  becomes  one  of  locating  the  nearest  boundary  in  the  stack,  at 
which  time  planar  logic  is  applicable.  The  final  portion  of  the  paper  introduces  an  inductive  argument 
to  extend  the  technique  into  an  arbitrary  number  of  dimensions,  and  it  is  proven  that  the  annulus 
attachment  opcode  consumes  log2  (3d  - 1)  bits,  where  d  is  the  number  of  dimensions. 


1.0  INTRODUCTION. 


The  point  inclusion  problem  (i.e.,  deciding  whether  point  p  is  contained  within  boundary  (3)  is 
well-studied.  One  recent  text  contends  that  "The  problem  of  locating  a  point  in  a  subdivision  of  the 
plane  or  in  a  cell  complex  in  a  higher-dimensional  space  is  one  of  the  oldest  and  best  understood 
problems  in  computational  geometry"[E2] .  Nevertheless,  a  fast  deterministic  algorithm  has  continued 
to  evade  a  fully  successful  implementation.  Although  several  elegant  theoretical  techniques  are 
described  in  the  literature,  none  has  been  successfully  implemented  for  boundaries  of  general 
complexity.  Previous  attempts  at  fully  successful  implementations  of  inclusion  testing  have  failed,  due 
chiefly  to  one  of  two  oversights:  a)  a  digital  computer  is  limited  by  finite  precision  arithmetic;  b)  the 
process  of  detecting  boundary  crossings  is  a  non-trivial  process  . 

1.1  Statement  of  the  Problem.’ 

Given  a  point  and  a  closed  digital  boundary  containing  n  coordinates,  implement  a 
deterministic,  fast  algorithm  to  discern  whether  or  not  the  point  is  inside  the  boundary.  By 
deterministic,  it  is  meant  that  the  solution  is  always  correct,  and  not  subject  to  round-off  error  due  to 
finite  precision  arithmetic.  By  fast,  it  is  meant  that  the  technique's  query  time  is  a  polynomial  function 
of  n,  preferably  convergent  upon  0(log  n].  In  addition,  the  following  problematic  conditions  must  be 
accommodated  by  the  inclusion  testing  process:  1)  areal  collapse  due  to  low  resolution  of  the  digitizing 
process;  2)  self-intersecting  (non-simple)  boundaries;  3)  multiply-connected  sets  (Fig.  1). 


Figure  1.  Planar  boundaries  may  exhibit  a  variety  of  problematic  conditions. 

1.2  Previous  Approaches  to  the  Point  Inclusion  Problem. 

It  is  not  the  intention  of  this  paper  to  provide  a  historical  perspective  of  the  boundary  inclusion 
testing  problem,  or  for  that  matter,  the  point-in-polygon  problem,  as  the  planar  case  is  called.  Suffice  it 
to  say  that  for  boundaries  of  general  complexity,  no  fast  deterministic  implementation  is  documented  in 
the  literature.  Three  popular  techniques  are  briefly  discussed  here;  they  are  the  parity  algorithm,  the 
winding  number,  and  refined  triangulation. 

1.2.1  The  Parity  Algorithm. 

Description.  The  technique  approaches  the  problem  topologically,  using  the  Jordan  Curve 
Theorem.  It  proceeds  by  drawing  a  line  from  a  query  point  through  a  boundary,  while  counting  the 
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number  of  "crossings".  The  query  point  is  inside  the  boundary  if  the  count  is  odd;  otherwise  it  is 
outside. 


Barriers  to  Implementation.  The  technique  may  be  deceived  by  degenerate  tangent  conditions 
which  are  perceived  as  crossings,  and  vice  versa.  Corrective  measures  such  as  vertex  perturbation  cause  a 
prohibitive  lag  on  algorithm  performance,  and  still  do  not  guarantee  a  deterministic  decision  [El].  One 
researcher  is  dubious  that  a  fully  successful  implementation  can  ever  exist,  due  to  the  inherent  sensitivity 
of  lire  intersection  algorithms  to  finite  precision  floating  point  arithmetic  [FI  ]. 

1.2.2  The  Winding  Number  Approach. 

Description.  The  technique  is  analytic  in  nature.  Based  on  Cauchy’s  Theorem,  the  integral  of  an 
analytic  function  about  a  query  point  is  computed,  and  if  zero,  the  point  is  judged  to  be  outside  the 
boundary;  otherwise  the  integral  must  be  a  multiple  of  2n,  and  the  point  is  judged  to  be  inside  [G 1  ]. 

8arriers  to  Implementation.  Roundoff  error  occasionally  results  in  an  incorrect  inclusion 
decision,  because  a  zero-sum  integral,  although  theoretically  possible,  is  not  feasible  on  finite  precision 
machines.  Also,  the  technique  exhibits  inferior  runtime  complexity  for  two  reasons:  it  uses 
floating-point  trigonometric  functions  which  are  compute-intensive,  and  it  must  access  each  boundary 
element  when  accumulating  the  integral. 

1 .2.3  The  Method  of  Refined  Triangulation. 

Description.  This  method,  due  to  Kirkpatrick  [K1],  proceeds  by  triangulating  a  planar 
subdivision  incrementally  into  bounded  regions  of  finer  granularity.  A  search  is  performed  to 
determine  if  the  query  point  resides  in  one  of  the  triangulated  subdivisions. 

Barriers  to  Implementation.  Although  the  algorithm  is  of  0[log  n]  time  complexity,  the  question 
of  whether  the  constants  suffice  for  real  time  repetitive  queries  remains  open  [E2],  The  best  constants 
achieved  to  date  are  based  on  results  obtained  by  analysis  of  the  Four  Color  Theorem,  and  it  is  not  clear 
that  they  facilitate  a  fast  implementation. 


Table  1.  Point-in-polygon  algorithms. 


2.  THE  INNER  ANNULUS:  A  PROXIMITY-BASED  APPROACH  TO  INCLUSION  TESTING. 


In  a  digital  domain,  a  boundary  may  be  represented  as  a  linked  list  of  coordinates,  with  the  head 
contiguous  with  the  tail.  If  the  boundary  is  oriented  counterclockwise,  then  the  left-handed  limit  of  the 
boundary  is  on  the  interior,  if  a  counterclockwise  traversal  of  the  list  is  performed,  the  set  of  discrete 
points  to  the  left  may  be  collected  into  another  list  called  the  inner  annulus  [C2].  Boundary  inclusion 
testing  for  a  query  point  is  performed  by  comparing  the  distance  to  the  nearest  boundary  point  with 
that  to  the  nearest  annulus  point;  if  the  distance  to  the  annulus  is  smaller,  the  query  point  is  on  the 
interior.  Since  the  technique  is  metrical,  it  provides  distance  and  direction  to  the  boundary  along  with 
the  inclusion  decision.  It  is  peculiar  that  inclusion  testing  thus  reduces  to  a  special  kind  of  proximity 
testing. 

2. 1  Adopting  a  Convention  to  Assure  a  Unilateral  Interior. 

If  the  orientation  of  a  boundary  is  assured  prior  to  run  time,  an  automated  inclusion  testing 
process  can  exploit  knowledge  of  a  unilateral  interior  during  boundary  traversal.  A  left-handed 
convention  is  adopted  to  achieve  the  search  space  reduction.  By  left-handed,  it  is  meant  that  the 
boundary  is  oriented  counterclockwise,  to  assure  the  interior  is  to  the  left.  However,  if  the  boundary  is 
multiply-connected,  the  boundary  of  any  hole  it  contains  must  be  oriented  in  a  clockwise  direction, 
because  the  interior  of  the  hole  is  outside  the  boundary  [R1  ].  In  Figure  2,  a-  continuous  boundary  is 
represented  by  the  solid  line,  and  its  inner  annulus  by  the  dashed  line.  The  digital  boundary  is 
represented  by  black  squares,  and  the  annulus  by  white  squares. 


Figure  2.  Continuous  and  digital  versions  of  the  inner  annulus. 


2.2  Automated  Counterclockwise  Orientation  of  a  Digital  Boundary. 

An  algorithm  which  automates  the  counterclockwise  orientation  of  a  boundary  is  published 
elsewhere  [C3].  it  is  described  only  in  passing  here.  The  logic  is  as  follows:  a  boundary's  list  of 
coordinates  is  searched  sequentially  and  a  coordinate  with  maximal  abscissa  is  obtained,  along  with  its 
predecessor  and  successor  coordinates.  The  difference  between  the  ordinates  of  the  maximal-abscissa 
coordinate  and  its  predecessor  are  computed,  as  well  as  the  respective  difference  between  the  ordinates 
of  the  successor  and  maximal-abscissa  coordinate.  If  either  difference  is  less  than  zero,  the  boundary  is 
oriented  clockwise;  otherwise  it  is  counterclockwise.  It  is  a  simple  matter  to  reverse  the  boundary  list  if 
the  orientation  is  opposite  that  desired 
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2.3  The  Inner  Annulus  Technique  Accommodates  Problematic  Boundary  Conditions. 

2.3.1  Multiply-connected  Sets. 

If  a  multiply-connected  boundary  contains  a  single  hole,  then  the  hole's  boundary  may  be 
oriented  in  a  clockwise  fashion  using  the  algorithm  described  above.  This  step  is  necessary  because  the 
interior  of  the  outer  boundary  is  actually  exterior  to  the  boundary  of  the  hole. 

If  the  hole  itself  contains  another,  the  inner  hole  must  be  oriented  counterclockwise,  since  its 
interior  is  also  the  interior  of  the  outermost  boundary.  The  general  rule  is  as  follows:  let  a  boundary  be 
oriented  counterclockwise.  Orient  any  hole  it  directly  contains  in  a  clockwise  fashion.  If  this  hole  is 
multiply-connected,  then  any  hole  it  contains  must  be  oriented  counterclockwise,  etc.  Continue  until  no 
multiple  connectedness  remains. 

2.3.2  Counterclockwise  Orientation  of  Non-Simple  Boundaries. 

A  self-intersecting  boundary  is  called  non-simple.  In  Figure  3,  the  boundary  on  the  left  is 
non-simple.  It  may  be  oriented  in  a  counterclockwise  direction  with  the  following  retracing  operation. 
Find  a  point  with  maximum  abscissa  and  assure  that  its  predecessor  and  successor  are  in 
counterclockwise  order  (if  not  then  the  boundary  list  must  be  reversed).  Starting  from  that  point, 
traverse  the  boundary  in  the  direction  of  the  successor,  and  proceed  to  the  right  at  self-crossing  areas. 
Continue  collecting  points  until  the  predecessor  is  encountered.  The  points  collected  constitute  a  simple 
boundary,  which  is  ordered  counterclockwise.  This  linear-time  algorithm  may  be  invoked  offline  in  a 
preprocessing  step 


Figure  3.  Retracing  a  non-simple  boundary  to  obtain  a  simple  one. 

2.3.3  Areal  Collapse  due  to  Poor  Digital  Resolution. 

If  a  boundary  contains  a  region  which  possesses  less  width  than  the  resolution  of  a  digitizing 
process,  the  area  is  collapsed  into  a  linear  stub  during  digitization.  In  Figure  4,  the  closed  boundary  on 
the  left  exhibits  a  small  convex  region  at  its  lower  right.  During  digitization,  resolution  error  causes  the 
region  to  collapse  (points  2-3).  When  the  boundary  is  traversed  in  a  counterclockwise  direction,  the 
ordered  sequence  of  points  {0-1-2-3-3-2-1-4}  is  visited.  The  problematic  area  may  be  detected  with  a 
preprocessing  step  which  traverses  the  boundary  in  counterclockwise  order,  while  looking  for  strings  of 
duplicate  boundary  coordinates,  where  the  second  occurrence  of  the  string  is  encountered  in  reverse 
order.  Such  duplicate  coordinates  of  the  boundary  may  be  tagged  by  turning  on  a  parity  bit  m  the 
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upper  portion  of  the  word  used  to  represent  the  coordinate.  This  concept  is  further  developed  in 
section  2.8. 


2.4  The  Relationship  of  the  Inner  Annulus  of  a  Boundary  to  the  Set  of  Normals  to  the  Boundary. 

A  point  in  the  plane  may  be  connected  diagonally  (d-connected)  or  non-diagonally 
(4-connected)  [R1 J.  Therefore,  in  a  planar  application,  there  are  eight  ways  for  an  annulus  point  to  be 
attached  to  a  boundary  point.  This  section  demonstrates  that  the  inner  annulus  is  actually  comprised  of 
the  set  of  unit  normal  vectors  which  point  into  the  interior  of  a  boundary. 

2.4. 1  The  Combinatorics  of  Local  Boundary  Behavior. 

Since  during  preprocessing  the  annulus  technique  assures  that  a  closed  boundary  does  not 
self-intersect,  the  behavior  of  any  local  boundary  section  may  be  explicitly  described.  A  boundary 
3-tuple  is  a  set  of  three  counterclockwise-oriented  boundary  points  called  respectively  the  predecessor, 
the  center,  and  the  successor.  The  predecessor  and  center  points  of  a  3-tuple  may  be  connected  in  any 
of  eight  ways,  whereas  the  center  and  successor  may  subsequently  be  connected  in  only  five  ways, 
producing  a  total  of  forty  combinations.  However,  an  annulus  element  can  be  attached  to  a  point  in  the 
plane  in  only  eight  ways,  since  any  planar  point  has  exactly  eight  neighbors.  Therefore,  it  is  necessary  to 
discover  a  many-to-one  mapping  which  produces  a  range  of  eight  states  from  a  domain  of  forty.  The 
mapping  is  obtained  by  observing  the  magnitudes  of  the  differences  between  the  abscissas  and 
ordinates  of  contiguous  elements  of  the  3-tuple. 

An  annulus  element  is  actually  a  digital  representation  of  the  unit  vector  to  the  left  of  center 
The  logic  which  produces  the  unit  vector  is  a  function  of  three  arguments:  the  predecessor,  the  center, 
and  the  successor  coordinates.  Since  this  3-tuple  is  ordered  counterclockwise,  the  inward-pointing  unit 
vector  is  to  the  left,  and  is  orthogonal  to  the  direction  of  the  3-tuple.  Note  that  if  the  order  of  the 
3-tuple  is  reversed  (i.e.,  changed  to  successor,  center,  predecessor),  the  same  function  produces  a  unit 
vector  to  the  right.  In  fact,  the  fur.. ion  is  utilized  in  this  very  manner  to  implement  the  algorithm  for 
non-simple  boundaries  described  above  in  section  2.3.2. 

For  elaboration  of  the  annulus  element  /  inside  unit  vector  equivalence,  refer  to  Figure  5.  The 
black  boxes  represent  boundary  3-tuple  behaviors,  and  the  associated  white  box  is  the  annulus  element 
produced  for  that  specific  behavior.  The  arrows  on  the  far  right  are  a  legend  which  depict  the  direction 
of  the  predecessor  to  the  center  box,  while  the  numbers  in  the  far  left  column  reflect  the  direction.  For 
example,  the  up  arrow  represents  the  vector  "01 ",  which  corresponds  to:  no  change  in  x;  increment  y  by 
1.  The  numbers  to  the  left  of  each  icon  depict  the  direction  from  the  middle  box  to  the  successor. 
Referring  to  the  upper  left  icon,  the  boundary's  directional  behavior  is  encoded  by  the  string  "0101”, 
which  represents  two  consecutive  northerly  directions.  The  numbers  at  the  top  of  each  icon  are  the 
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inner  annulus  attachment  instruction.  Therefore  "-10“  decodes  to:  “attach  the  annulus  element  by 
decrementing  the x-value  by  1  and  leaving  the y-value  alone". 


-10 

-10 

•11 

-1-1 

-11 

01 

"4 

11  h 

11^ 

■io  a 

10^ 

T 

10 

10 

i-i 

11 

1-1 

1 

0-1 

0-1  £3 

"h 

-A 

«  B 

-4 

i 

01 

01 

11 

-n 

11 

10 

10A 

-10 

0-1 

.-off 

0-1 

•’•iff 

-1-1 

M 

.1-1 

01 

•11 

•10 

.11 11 

•;o 

•11 

11 

> 

-  Sfi 

Olfii 

-5“ 

Sf 

11 

ai 

11  1.1 

J1 

11 

1.1 

n"? 

in 

\ 

-11 

•  11 

n.i 

"A 

11  .1  1 

-K 

1-1 

-ffi 

i.i 

01  1 
cni 

M 

10 

1*1  M 

10 

1-1 

•1-1 

1.1 4 

.,1  v 

CH3 

"  t 

-4 

* 

Figure  S.  The  annulus  element  is  attached  to  the  left  of  a  boundary  3-tuple. 

Eight  of  the  forty  behaviors  are  linear  combinations  of  the  other  thirty-two.  This  set  includes  the 
fourth  elements  of  the  first  four  rows,  and  the  third  elements  of  the  last  four  rows.  These  eight  may  be 
safely  discarded  because  their  conditions  are  duplicated  by  other  behaviors.  Thus  we  ar"  required  to 
develop  a  32-to-8  mapping.  We  proceed  in  a  backward-chaining  fashion,  retrospe.uvely  looking 
backwards  from  the  eight  output  states  to  the  various  combinations  of  abscissa  and  ordir '  te  differences 
which  produce  them. 

Note  that  several  behaviors  produce  the  same  output.  For  example,  the  instruction  "-10"  is 
produced  by  the  first  and  second  boundary  behaviors  in  the  first  row,  and  by  the  second  and  fourth 
behaviors  in  the  fifth  row.  Thus,  four  different  boundary  behaviors  all  generate  the  ’-10"  annulus 
attachment  instruction,  which  dictates  that  the  annulus  element  be  attached  at  the  left  of  center  of  the 
3-tuple.  These  behaviors  may  therefore  be  combined  into  the  system  of  conditional  clauses  represented 
in  the  table  at  the  upper  left  of  Figure  6.  Note  that  the  commonality  for  the  conditional  test  lies  in  the 
fact  that  the  ordinate  differences  are  both  equal  to  one,  for  all  four  behaviors.  In  this  spirit  we  continue, 
and  map  the  remaining  twenty-eight  behaviors  of  Figure  5  into  the  tables  depicted  in  Figure  6.  This 
explicit  mapping  constitutes  the  formal  design  specification  for  an  algorithm,  which  we  now  develop  in 
detail. 
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Figure  6.  The  compact  mapping  which  relates  boundary  behavior  to  annulus  vector  attachment. 

2.5  The  Formal  Design  Specification  for  Computing  the  Inner  Annulus  Vector. 

A  query  point  may  be  considered  as  a  query  vector,  drawn  from  the  origin  to  the  query  point. 
When  a  normal  vector  N  is  drawn  through  a  boundary  from  a  query  vector,  it  intersects  the  boundary 
first  at  some  point  p.  But  also  normal  to  the  boundary  at  p  is  some  annulus  element  a,  as  specified  at 
Figure  5.  The  vector  drawn  from  the  query  point  to  the  boundary  is  called  the  boundary  vector,  and  the 
vector  from  the  query  point  to  the  annulus  element  is  called  the  annulus  vector.  The  annulus  vector  is 
collinear  with  the  boundary  vector;  both  lie  on  the  normal,  and  the  magnitude  of  the  annulus  vector 
relative  to  the  boundary  vector  may  be  used  to  perform  an  inclusion  decision.  This  concept  is  formalized 
below. 

Let  r  be  a  query  vector  and  q  =  (xq,yq)  be  the  boundary  vector  nearest  to  r  on  closed, 
counterclockwise-oriented  boundary  0.  Letp(q)  =  (xp,yp)  be  the  predecessor  of  q  in  (3,  and  s(q)  =  (xs,y$) 
be  the  successor  of  q  in.  (3.  Let  (i ,  j}  be  a  basis  set  of  unit  vectors  as  conventionally  defined  for  the  plane 
Then  the  following  logic  provides  the  equations  for  computing  the  annulus  vector  ap: 


Let  Axp  a  Xq 

-  Xp; 
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The  ordering  of  the  test  is  important.  Note  that  conditions  (cl)  -  (c4)  are  heterogeneous;  i.e.,  a 
mix  of  abscissa  and  ordinate  differences  are  involved.  It  is  crucial  that  these  conditional  clauses  be 
tested  before  homogeneous  tests  (cS)  -  (c8).  This  is  because  an  injective  mapping  is  not  guaranteed 
unless  the  heterogeneous  tests  are  triggered  first  For  example,  note  that  the  homogeneous  test  which 
generates  an  annulus  element  on  the  "right"  specifies  that  respective  ordinate  differences  be  equal  to 
-1.  Suppose  that  this  test  was  performed  before  any  of  the  others.  Referring  back  to  Figure  6,  notice 
that  the  heterogeneous  tests  for  "upper  right"  and  "upper  left"  contain  two  clauses  which  satisfy  the 
condition  for  “right",  which  would  result  in  an  erroneous  annulus  attachment.  Thus,  the 
heterogeneous  tests  must  be  sequenced  before  the  homogeneous  tests  to  guarantee  one-to-oneness. 

2.6  The  Distance  to  the  Annulus  Vector  as  a  Measure  of  Inclusion. 

The  inner  annulus  technique  relies  on  a  comparison  of  proximity  information  to  arrive  at  an 
inclusion  decision.  For  computational  efficiency,  the  distance  metric  implemented  is  the  d4  distance, 
also  variously  known  as  the  Manhattan  distance,  or  the  city-block  distance  [R1].  This  distance  metric 
avoids  the  multiplication  and  radical  operations  inherent  to  the  Euclidean  metric,  and  may  be  efficiently 
implemented  with  integer  arithmetic. 

Definition  2.6. 1  The  d4  distance. 

Let  p  =  (xi,  yi)  and  q  =  (x2,  y2)  be  two  points  in  the  plane.  Then  the  d4  distance  from  p  to  q,  denoted 
d4(p,  q)  is  defined  to  be: 

d4(p,q)  =  I  xn  -  x2  l  +  I  yi  -  yz  I 

Because  the  d4  distance  between  two  points  equals  the  Euclidean  distance  only  when  either  the 
respective  ordinates  or  abscissas  of  the  two  points  are  themselves  equal,  one  must  take  care  to  devise  a 
proximity  test  which  produces  the  same  inclusion  decision  as  the  true  Euclidean  metric. 

Definition  2.6.2  Trichotomy  of  Metrical  Inclusion. 

Let  q  be  a  query  vector,  P  a  closed  boundary,  and  p  the  nearest  boundary  vector  to  q.  Let  ap  be  the 
annulus  vector  attached  to  p. 

Point  q  is  said  to  be  on  boundary  p  if  and  only  if  d4(q  ,  p)  =  0 
else  Point  q  is  said  to  be  inside  boundary  p  if  and  only  if  d4(q ,  ap)  <  d4(q  ,  p) 

else  Point  q  is  said  to  be  outside  boundary  p  if  and  only  if  d4(q ,  ap)  >  d4(q  ,  p). 

Example.  Is  the  coordinate  q  *  (50,  50)  inside  a  boundary  with  closest  point  p  =  (100,  100);  where  the 
predecessor  of  p  is  (101 , 99),  and  the  successor  of  p  is  (101,  101)? 

Solution.  Axp  =  100-  101  =  -1;  Ayp  =  100-99  =  1;  Axs  =  101  - 100  =  1;  Ays  =  101  -  100  =  1. 


Sine#  both  Ayp  and  Axs  are  equal  to  1 ,  clause  cl  is  satisfied,  producing  the  annulus  element  ap  =  (100- 
1,100  +  1)  =  (99,101).  The  d4  distance  from  q  to  ap  is  100,  as  is  the  d4  distance  from  q  to  p.  Since  d4 
(q,ap)  <  d4(q,p),  q  is  inside  the  boundary. 


Figure  7.  The  magnitudes  of  the  boundary  and  annulus  vectors  decide  inclusion. 

Theorem  2.6.  The  Boundary  Vector  and  Corresponding  Annulus  Vector  are  Linearly  Dependent. 

Let  q  be  a  query  point  and  ()  be  the  boundary  of  a  closed  simple  curve.  Let  N  =  qp  be  the  normal  vector 
drawn  from  q  through  the  boundary,  and  let  p  be  the  point  of  intersection.  Let  T  be  the  tangent 
through  p,  and  a  be  the  annulus  element  attached  to  p.  Let  the  annulus  vector  be  denoted  by  qa,  and 
let  the  boundary  vector  be  denoted  by  qp.  Then  qa  and  qp  are  linearly  independent. 

Proof:  qp  J_  T  by  definition,  and  qaJ.T  by  construction.  Since  in  the  plane  there  is  only  one  line  drawn 
through  a  point  orthogonal  to  a  given  line,  qp  and  qa  are  collinear.  8ut  collinear  vectors  are  linearly 
dependent,  which  means  that  there  exist  pi  and  p2,  not  both  zero,  such  that: 

Pl  *  qa  +  p2  *  qp  =  0 

qa  =  -  (  P2  /  pt )  *  qp,  Pi  *0  . 

if  q  is  on  the  interior  of  (3,  then  by  definition  ||  qa  ||  <  !|  qp  ||  =>  -  i  p2  /  Pi )  <1  pi  >  -  p2. 
Conversely,  if  q  is  outside  3,  a  similar  argument  may  be  used  to  show  that  pi  <  -  p2. 

2.7  An  Annulus  Attachment  Opcode. 

In  the  last  section  it  was  demonstiated  that  in  the  plane  there  are  eight  ways  in  which  to  attach 
an  annulus  vector  to  a  boundary  vector.  Since  eight  states  may  be  minimally  encoded  with  three  bits,  an 
annulus  encoding  algorithm  is  optimized  if  and  only  it  utilizes  three  bits  to  store  the  annulus 
attachment  instruction.  The  figure  below  demonstrates  a  candidate  opcode  convention  to  store  the 
instruction.  For  example,  the  opcode  "111"  decodes  to  the  instruction  "attach  the  annulus  unit  vector 
at  the  upper  left  of  the  coordinate”.  At  query  time,  the  opcode  is  used  to  compute  the  abscissa  and 
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ordinate  of  the  attached  inner  annulus  vector.  Respective  distances  from  the  query  point  to  the  nearest 
boundary  point  and  the  annulus  vector  are  then  compared  to  arrive  at  the  inclusion  decision. 


2.8  A  Point-in-Polygon  Computing  Machine. 

Since  the  planar  annulus  opcode  consumes  only  three  bits  of  information,  it  is  feasible  to  embed 
it  within  a  boundary  coordinate,  with  a  small  corresponding  loss  in  the  number  of  coordinates 
expressible.  One  way  to  accomplish  this  within  a  computer  word  is  depicted  below.  This  scheme 
accommodates  16000  possible  abscissa  values  and  16000  possible  ordinate  values,  packed  in  a  32-bit 
word  along  with  the  annulus  opcode. 


Figure  9.  Packing  the  inclusion  information  into  a  coordinate  word. 

The  embedding  process  may  be  performed  during  preprocessing  with  a  single  pass  over  the 
boundary.  The  ordinate  of  a  boundary  coordinate  is  stored  in  bits  0-13,  and  the  abscissa  in  bits  14-27. 
The  annulus  opcode  is  precomputed  and  written  into  bits  28-30.  Bit  31  is  used  to  handle  boundaries 
which  have  collapsed  due  to  inferior  resolution  during  the  digitization  process.  A  degeneracy  exists  at  a 
boundary  point  if  and  only  if  bit  31  is  set  to  1,  which  indicates  that  the  annulus  element  should  be  set 
equal  to  the  boundary  element  at  that  point.  In  the  example  of  Figure  9,  the  annulus  opcode  "111" 
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instructs  that  an  inner  annulus  element  be  attached  at  the  upper  left  of  boundary  coordinate  (1,1),  at 
coordinate  (0 , 2). 

At  run  time,  masks  may  be  used  to  decode  the  annulus  opcode  and  coordinate  information,  in 
octal  notation  the  annulus  opcode  mask  is  16000000000.  It  is  feasible  to  dedicate  a  decoding  register  in 
hardware  for  the  unmasking  process.  This  register  could  be  coupled  in  a  pipeline  with  another  chip 
possessing  arithmetic  logic  which  computes  the  respective  distances  from  the  query  point  to  the  annulus 
element  and  the  boundary  element.  The  final  pipeline  stage  might  be  a  comparator  register  to  render 
the  minimum. 

2.9  The  Computational  Complexity  of  the  Planar  Annulus  Encoding  Technique. 

For  domains  with  modest  real  world  complexity,  linear  search  is  adequate  to  find  the  closest 
point  (and  encoded  annulus  element)  to  a  query  point.  This  is  the  case  for  most  real  world  applications 
which  are  displayed  on  a  single  CRT  screen.  However,  when  boundaries  become  asymptotic,  or  when 
the  map  data  becomes  overwhelmingly  dense,  as  on  some  topographic  maps  with  extreme  elevation 
changes,  more  efficient  data  structures  and  algorithms  are  recommended.  Under  such  conditions,  the 
author  suggests  the  following  approach. 

Preprocessing.  0(  n  *  log  n  ].  The  annulus  technique  begins  by  locating  the  nearest  boundary  point  to  a 
query  point.  The  Voronoi  diagram  is  an  efficient  representation  scheme  for  proximity  processing  [El, 
PI],  The  Voronoi  diagram  for  a  set  of  n  points  can  be  constructed  in  preprocessing  time  0(  n  *  log  n  ]. 

Storage.  0[  n  ].  Since  the  annulus  opcode  utilizes  three  bits,  the  constant  multiplier  of  n  is  (w  +  3)/w, 
where  w  is  the  number  of  bits  in  the  word  used  to  encode  a  coordinate.  In  the  plane,  the  storage 
requirement  is  1.09*n,  assuming  a  32-bit  word  size,  with  both  abscissa  and  ordinate  packed  into  the 
same  word.  However,  if  the  encoding  scheme  discussed  in  section  2.8  is  utilized,  the  annulus  opcode  is 
stored  in  the  same  word  as  a  coordinate,  which  reduces  the  storage  requirement  to  exactly  n.  Of  course, 
in  this  case,  there  is  a  corresponding  loss  in  the  number  of  coordinates  expressible.  Further 
improvements  to  achieve  superiinear  stoage  could  be  made  if  one  elected  to  represent  a  boundary  with 
a  polynomial  or  polygonal  approximation.  The  annulus-based  approach  readily  accommodates  data 
compression  schemes,  and  the  query  time  would  improve  due  to  a  smaller  search  space;  the  price  paid  is 
the  error  introduced  by  the  approximation  scheme. 

Query  Time.  0(  log  n  ].  The  closest  boundary  point  to  a  query  point  can  be  obtained  in  0(  log  n  ]  time, 
using  the  preconstructed  Voronoi  diagram.  Simultaneously,  the  annulus  attachment  opcode  is  fetched 
along  with  it  packed  in  the  upper  portion  of  the  coordinate.  Negligible  constant  time  is  required  to 
compare  the  two  distances  from  the  query  point.  Thus  the  query  time  complexity  is  Of  log  n  ]. 

3.  AN  EXTENSION  TO  THREE  DIMENSIONS. 

The  extension  of  the  planar  annulus  technique  to  three  dimensions  is  straightforward.  A 
three-dimensional  object  may  be  conceptualized  as  a  stack  of  planar  boundaries,  each  one  pixel  in 
height.  Testing  for  inclusion  within  the  solid  is  equivalent  to  locating  the  (nearest  point  on)  the  nearest 
planar  boundary,  along  with  the  annulus  element  attached  to  it;  the  respective  distances  are  then 
compared  to  arrive  at  the  inclusion  decision. 

Figure  10  illustates  the  annulus  technique  for  a  sphere  and  another  object  modeled  as  a  stack  of 
planar  boundaries.  In  the  case  of  the  sphere,  query  point  p  is  nearest  to  some  point  on  the  equator  of  5. 
But  the  equator  is  a  planar  object,  so  it  has  an  inner  annulus.  The  planar  annulus  logic  is  applied  to 
arrive  at  an  inclusion  decision.  For  the  modeled  object,  query  point  p  is  nearest  to  q,  which  is  an  element 
of  the  highest  boundary  in  the  stack.  But  q  is  attached  to  annulus  element  a,  which  has  been 
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precomputed  as  a  function  of  local  boundary  behavior  about  q.  Since  the  magnitude  of  q  is  less  than 
that  of  a,  it  is  decided  that  point  p  is  on  the  exterior. 


Figure  to.  The  extension  of  the  annulus  technique  to  three  dimensional  objects. 

3.1  The  Complexity  ofthe  Annulus  Technique  in  Three  Dimensions. 

The  closest  point  of  a  planar  boundary  to  a  query  point  can  be  found  in  Of  log  m  ]  query  time, 
where  m  is  the  length  of  the  boundary.  This  operation  must  be  performed  p  times,  where  p  is  the 
number  of  boundaries  contained  in  the  stack  which  comprise  the  solid.  Therefore,  the  point  inclusion 
time  complexity  fora  three-dimensional  solid  is  log  mi  ♦  log  m2  +  ...  +  logmp  = 

P 

log2  n  rrij 
i  =  1 

The  optimal  time  complexity  is  an  open  research  issue.  If  a  three-dimensional  solid  is  represented  as  a 
cell  bounded  by  a  complex  of  n  intersecting  planes,  then  it  has  been  shown  independently  by  two 
researchers  [Cl ,  E3]  that  the  three-dimensional  point  inclusion  problem  can  be  solved  in  Of  log^n  ]  query 
time,  with  a  storage  requirement  of  0[  n3  ]. 

4.0  GENERALIZATION  TO  HIGHER  DIMENSIONS. 

In  this  section  an  attempt  is  made  to  generalize  the  annulus-based  inclusion  testing  technique  to 
an  arbitrary  number  of  dimensions.  Since  the  technique  seeks  to  attach  an  annulus  vector  to  the 
boundary  point  nearest  a  query  point,  it  is  necessary  to  know  how  many  neighbors  a  boundary  point 
possesses  in  d  dimensions,  to  derive  the  number  of  bits  required  to  encode  the  annulus  attachment 
instruction.  With  this  goal  in  mind,  we  proceed  to  develop  a  set  of  five  axioms  which  describe  a 
methodology  to  inductively  construct  a  new  dimension  from  a  previous  one.  The  first  four  axioms 
closely  parallel  the  Peano  axioms  for  the  natural  numbers.  The  fifth  axiom  diverges  from  Peano  when 
we  postulate  the  construction  of  a  new  dimension. 

4. 1  Axioms  of  Inductive  Dimensionality. 

Axiom  01.  A  point  has  dimension  0,  and  is  without  axis. 

Axiom  D2.  Every  dimension  D  has  a  unique  successor  dimension  D  1 ,  which  has  a  unique  axis.  D  is  said 
to  be  the  predecessor  dimension  of  D  ♦  1 . 
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Axiom  03.  A  point  is  not  a  successor  dimension  of  any  dimension. 
Axiom  04.  Distinct  dimensions  have  distinct  successor  dimensions. 


Axiom  05.  Bilateral  Dilation.  Dimension  D  +  1  is  constructed  by  interposing  a  hyperplane  from 
dimension  0  between  two  other  hyperplanes  from  0,  and  dilating  the  structure  along  the  axis  of  D  +  1 . 


(*‘~J 


Figure  11.  Building  higher  dimensions. 

The  fifth  axiom  permits  us  to  perform  two  operations  (replication  and  dilation)  on  an  object  in  a 
lower  dimension  to  produce  an  object  in  a  new  dimension.  Figure  1 1  illustrates  successive  applications 
of  the  operations  to  produce  the  first  dimension  from  a  point  source;  the  plane  from  a  line  source;  and 
a  cube  from  a  planar  source.  We  can  conceive  of  the  operation  of  bilateral  replication  in  the  third 
dimension  to  get  to  the  fourth  (a  cube  is  interposed  between  two  others),  but  we  cannot  visualize  the 
axis  along  which  to  dilate  the  composite,  because  our  spatial  world  is  restricted  to  three  dimensions. 
However,  if  we  subscribe  to  the  axioms,  we  derive  the  following  results. 

4.2  Neighbor  Theorem  (number  of  Digital  Neighbors  in  d-Space).  In  d-space,  the  number  of  digital 
points  neighboring  a  reference  point  is  3^  - 1 . 

Proof  (induction): 


Stepl.  Ifd  a  1,  the  space  is  linear,  apd  the  number  of  neighbors  to  a  point  is  3  - 1  =2. 


Step  2.  Assume  that  in  k-spSce,  the  number  of  neighbors  of  a  reference  point  is  sk  =  3k  - 1 . 
Step  3.  Prove  that  in  (k  ♦  1)-space,the  oer  of  neighbors  s(k  *  i)  =  3(k  ♦  !)-1. 
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Since  sj<  =  3k  -  1,  then  3k  is  the  sum  total  of  the  reference  point  plus  its  neighbors. 
Axiom  OS  (bilateral  dilation)  applied  to  this  set  creates  3*3k  points  in  dimension  k  +  1,  of 
which  one  is  a  reference  point  and  the  remaining  3*3k  - 1  are  its  neighbors.  But  3*3k  - 1 
=  3(k  +  D-l.  QED. 

4.3  Inner  Annulus  Opcode  Storage  Theorem  (bit  length  of  the  annulus  opcode  in  d  dimensions).  The 
inner  annulus  opcode  to  effect  an  inclusion  decision  concerning  a  query  point  in  d  dimensions  may  be 

encoded  in  log  2[3d  •  1]  bits . 

Proof.  Theorem  4.2  asserts  that  a  reference  point  in  d-space  has  3d  -  1  neighbors.  But 
these  are  precisely  the  number  of  ways  in  which  an  inner  annulus  element  may  be 

attached  to  a  reference  point.  The  number  of  bits  necessary  to  encode  3d  -  1  digital 

neighbors  is  log  2[3d  - 1].  QED. 

4.4  Opcode  Storage  Approximation  Corollary  (annulus  opcode  bit  length  approximation).  The  number 
of  bits  necessary  to  encode  the  inner  annulus  opcode  in  d  dimensions  is  bounded  from  above  by  1 .6d. 

Proof.  From  Theorem  4.3,  the  number  of  bits  is  exactly  log  213d  - 1], 

But  log  2{3d  - 1 1  <  log  2(3d]  =  d*  log  23  <  1 .6d.  QED. 


4.5  The  Storage  Requirement  of  the  Annulus  Encoding  Technique  in  Higher  Dimensions. 

Given  a  word  length  of  w  bits,  with  tc  bits  used  to  encode  the  annulus  opcode.  The  storage 
requirement  is  n  +  (x/w)*n  a  ((w  +  k)/w)  *  n,  which  is  clearly  O(n).  Application  of  Theorems  4.2  and 
4.3  produce  Table  2,  which  depicts  the  growth  of  the  annulus  opcode  and  storage  constant  in  higher 
dimensions. 


d  =  dimension 

n  a  neighbors 

&  a  opcode  bits 

a  storage  inclusion  constant 

1  * 

2 

1.00 

1.03 

2 

a 

3.00 

1.09 

3 

26 

4.63 

1.14 

4 

80 

6.24 

1.20 

5 

242 

7.89 

1.25 

6 

728 

9.42 

1.29 

d 

3d-  1 

m  - 1  +  (3d- 1)  /[  2m] 

**  (w  +  tcl/w 

*  inclusion  is  not  an  issue  in  one  dimension  However,  since  the  annulus  is  by  convention  to  the  left  of  a 
boundary,  the  tecnmque  is  useful  for  deciding  uoon  which  side  of  a  line  a  query  point  lies. 

**  C  2rn  J  is  a  steo  function,  which  is  defined  by  the  minimum  m  such  that  2m  >  3d  -  1  8ut  2m  >  3d  - 1  db 
m  ■  min  { 1  $  i;  1  >  log  (3d  •  1)/  log  2  ). 

Table  2.  The  number  of  annulus  opcode  bits  required  to  decide  inclusion  in  higher  dimensions. 
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S.O  CONCLUSIONS. 


An  optimized  point-in-polygon  algorithm  with  a  negligible  query  time  constant  multiplier  of 
log  n  has  been  presented  for  planar  boundaries.  As  a  bonus,  the  magnitude  and  direction  of  the  normal 
vector  from  the  query  point  are  returned  along  with  the  inclusion  decision.  The  algorithm  is  based  on  a 
topological  structure  called  the  inner  annulus,  which  is  demonstrated  to  be  the  union  of  the  set  of  unit 
normal  vectors  which  point  into  the  interior  of  a  boundary.  The  algorithm  operates  by  comparing  the 
respective  distances  of  the  query  point  from  the  boundary  and  annulus  vectors;  a  smaller  magnitude  for 
the  annulus  vector  implies  inclusion.  The  technique  pointedly  circumvents  the  finite  precision  problems 
which  plague  implementation  of  other  point-in-polygon  algorithms  such  as  the  parity  algorithm  and 
the  winding  number  approach.  It  has  been  shown  that  for  a  planar  subdivision,  an  opcode  to  attach  the 
annulus  vector  to  a  boundary  coordinate  may  be  precomputed  and  encoded  in  the  upper  three  bits  of 
the  coordinate.  Hence,  at  run  time,  when  the  closest  point  on  the  boundary  is  computed,  the 
instruction  to  compute  the  annulus  element  is  fetched  along  with  it  It  has  been  demonstrated  that  the 
technique  is  extensible  to  higher  dimensional  objects.  An  axiomatic  treatment  of  an  inclusion  decision 
in  d  dimensions  has  been  presented,  and  an  inductive  argument  has  demonstrated  that  the  number  of 
opcode  bits  required  to  store  the  inclusion  information  is  precisely  log2  (3<*  - 1)  bits. 
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